Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
22 crawler(s) on-line.
95 guest(s) on-line.
0 member(s) on-line.
You are an anonymous user. Register Now! |
|
|
|
| Poster | Thread | kolla
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 19-Jul-2025 21:04:12
| | [ #361 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3559
From: Trondheim, Norway | | |
|
| @ppcamiga1
Quote:
all this trolls that want to switch to x86 or arm should hard work on mui on aros
|
Why? All AROS MUI software already works with Zune… why would MUI for AROS make any difference? There would still not be any software for it!
Quote:
Why? There’s no unix software that would use it, and Amiga software that rely ib MUI doesn’t run natively on unix, so what’s the point? MUI can never work on unix, as ut relies on AmigaOS features such as shared memory between the programs, which is exactly the opposite of what’s typical for unix, where each program gets its own memory.
Besides that, the functionality of MUI already exists on UNIX, Qt, GTK and whatever Apple is doing, are examples of this.
Quote:
thats only viable way forward
|
It’s a pointless effort.
Why don’t you use unix already, everything you moan about exists already, only the "labels" are different.Last edited by kolla on 19-Jul-2025 at 09:05 PM.
_________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| | Status: Offline |
| | ppcamiga1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 20-Jul-2025 7:03:07
| | [ #362 ] |
| |
 |
Super Member  |
Joined: 23-Aug-2015 Posts: 1145
From: Unknown | | |
|
| @kolla
stop trolling start working on mui on aros
on arm any amiga like os compete with windows/android/ios it should be as good as windows/android/ios it should be unix based it shoud be juts amiga gui and graphics on top of unix
|
| | Status: Offline |
| | michalsc
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 20-Jul-2025 11:42:48
| | [ #363 ] |
| |
 |
AROS Core Developer  |
Joined: 14-Jun-2005 Posts: 476
From: Germany | | |
|
| @ppcamiga1
Go, do it yourself. Prove you can do anything. |
| | Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 20-Jul-2025 19:53:11
| | [ #364 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2828
From: Kansas | | |
|
| ppcamiga1 Quote:
stop trolling start working on mui on aros
on arm any amiga like os compete with windows/android/ios it should be as good as windows/android/ios it should be unix based it shoud be juts amiga gui and graphics on top of unix
|
Most Amiga fans like the AmigaOS or at least the software compatibility it provides, including most of the people here. You are on the wrong forum and trolling. Choosing Linux is easy and free. Buy your favorite popular hardware and install Commodore OS Vision and an Amiga emulator.
https://www.commodoreos.net/ https://en.wikipedia.org/wiki/Commodore_OS
Most x86-64 hardware will be able to emulate an Amiga with more performance than real PPC AmigaNOne hardware and even some low end RPi level hardware comes close. The Commodore OS Vision forum you want is at the following link.
https://forum.commodoreos.net/
You should ask for MUI for Linux from Linux programmers and from a Linux forum like the above link if you are going to ask on a forum. The developers here are primarily AmigaOS programmers. You do not call the pizza shop to order Chinese food do you? You do not go to the clothing store to buy groceries do you? Do you use your head for anything besides a hat rack?
ppcamiga1 Quote:
what you wrote is pure bs 68k still is not good enough like it was 30 years ago 68k still not reach cheap pc from win95 era level still there is no 68k at rational price with all things that was standard in 1995 like 2D/3D/FPU/MMU ppc is still more alive and still much better stop trolling and start working on 68k to get it at least to year 1995 level
|
I am actually going to respond to this comment from another thread here because I believe the failure of PPC is code density related. I saw a comment on Quora recently.
https://www.quora.com/Why-was-the-PowerPC-architecture-unable-to-keep-up-with-Intel-x86 Bob Colwell Quote:
Quote:
Why was the PowerPC architecture unable to keep up with Intel x86?
|
The PowerPC architecture was clearly capable of keeping up with x86. The PowerPC chips vs the Intel chips, that’s where a gap developed.
In some alternate universe, our Intel P6 design team might perhaps be less preternaturally gifted, and their P6 turned out late, slow, and uncompetitive. Meanwhile, in that same alternate universe, the PowerPC chips of the 1990’s didn’t suffer so much from having multiple companies trying to collaborate on them. And in that universe, the PowerPC company management understood what the Intel in our universe knew: it’s all about volume shipments. All good things flow from that. In that alternate universe, it would be Intel trying to keep up.
|
If this is the real Bob Colwell, he was chief architect of the Pentium Pro, Pentium II, Pentium III, and Pentium 4.
https://en.wikipedia.org/wiki/Bob_Colwell
From a technical perspective, he believes PPC could compete. So code density does not matter? No. Code density is the reason PPC failed. PPC may have had good enough code density back then for the desktop market but it did not have good enough code density for the embedded market. The 68k was abandoned in the hope that PPC would replace it in the embedded market to improve PPC chip volumes but SuperH and then ARM Thumb/Thumb-2 with 68k like code density but not performance replaced it. As Bob stated, "it’s all about volume shipments" and 10% of the desktop market could not compete against 90% of the desktop market, without other markets like the embedded market or server markets. Perhaps "the PowerPC company management" at Motorola "understood" the business side and necessity of combining the desktop and embedded markets but failed to appreciate the importance of code density at a technical level.
PPC is more dead and less viable than the 68k despite newer chips with higher performance. ARM rose up out of the embedded market using Thumb-2 chip volumes to try to compete with x86-64 on the desktop but switched to a higher performance AArch64 ISA that resembles PPC but has better code density. The embedded market market would never go back to PPC with AArch64 having better code density even if PPC could some how come back with similar features. PPC 32-bit code is ~20% larger than AArch64 64-bit code and PPC 64-bit code is likely larger yet. The 68k remains competitive in code density with Thumb-2 and likely BA2 but has better performance traits. The basic embedded market features are timeless.
What features make the Motorola 68000 still relevant for embedded systems despite being much older than the Intel 486? https://dev.to/adityabhuyan/what-features-make-the-motorola-68000-still-relevant-for-embedded-systems-despite-being-much-older-382p Quote:
The Motorola 68000 (and its derivatives) remains relevant in certain embedded systems applications despite being introduced in 1979, a full decade before the Intel 486. This longevity is due to several key features that make it particularly well-suited for embedded applications where predictability, simplicity, and reliability matter more than raw computing power.
...
The 68000's continued relevance demonstrates that in embedded systems, newer isn't always better. The architecture's simplicity, predictability, and proven track record make it a viable choice for applications where these characteristics outweigh raw computational performance. While modern ARM Cortex-M processors have largely taken over the embedded market, the 68000 family still holds niches where its specific advantages shine.
|
The 68k remained relevant for decades after not receiving development. The article above is about the 68000 specifically and lists some of the reasons, many of which apply to other 68k CPUs as well. The 68000 is still available directly from the Motorola successor NXP while the 32-bit CPUs are not. One of the reasons is that the 68000 chips available received an update to CMOS 3.3V full static designs. The 68020-68040 were 5V while the 68040V and 68060 are CMOS 3.3V full static designs too. The embedded market moved to a 3.3V standard and battery powered devices lower to 1.2V or 1.8V. This is possible do to silicon improvements. Higher voltages provide faster transistor switching but smaller chip processes can offset this with shorter distances for the electricity to travel. The 68000 remained in production because it supported 3.3V, because it supported a cheaper 16-bit data bus and because it was so easy to use. The 68060 replaced the 68040V and remained in production for many years but eventually better value SoCs replaced MPUs. SoCs reduce the chip count and board space while the cost is not much higher than MPUs. The 68060 is a great embedded MPU and would likely be viable if turned into a SoC, which just so happens to be what the 68k Amiga needs. Surely even you understand that the 68k Amiga has a much larger market and more software than any NG Amiga? Did you understand Bob Colwell when he stated, "it’s all about volume shipments"?
Last edited by matthey on 20-Jul-2025 at 08:04 PM. Last edited by matthey on 20-Jul-2025 at 08:00 PM. Last edited by matthey on 20-Jul-2025 at 07:59 PM.
|
| | Status: Offline |
| | kolla
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 20-Jul-2025 23:34:46
| | [ #365 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3559
From: Trondheim, Norway | | |
|
| @ppcamiga1
But MUI isn’t a GUI, it’s a GUI toolkit and a collection of boopsi classes for Intuition, which is the GUI… so what is it that you want ported? MUI for what unix GUI? _________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| | Status: Offline |
| | Karlos
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 21-Jul-2025 11:08:42
| | [ #366 ] |
| |
 |
Elite Member  |
Joined: 24-Aug-2003 Posts: 5017
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| | | Status: Offline |
| | ppcamiga1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 21-Jul-2025 18:10:44
| | [ #367 ] |
| |
 |
Super Member  |
Joined: 23-Aug-2015 Posts: 1145
From: Unknown | | |
|
| @kolla
stop trolling start working on mui port it to unix
|
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jul-2025 4:11:36
| | [ #368 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @matthey
Quote:
Most RISC ISAs with 32-bit fixed length encodings like MIPS, SPARC, ARM, Alpha, PA-RISC and PPC practically required a 32-bit data bus/memory while multiplexed address and data busses were common for RISC cores.
|
FYI, For the 1986 released IBM RT PC, the IBM ROMP CPU family has 16-bit and 32-bit RISC instruction sets before Hitachi SuperH. IBM ROMP CPU has 16 32-bit GPRs.
IBM ROMP is the precursor to fixed-length 32-bit instruction set POWER1 (led to PowerPC).
If there are competitive pressures in the MPU markets, NXP / STM PowerPC camp added 16-bit VLE instructions to the PowerPC. If there's a will, there's a way. That's the difference between alive vs dead. The living can adapt while the dead decay.
The same for MIPS's 16-bit instruction extensions. MIPS has combined with RISC-V. MIPS has CCP state backing Loongson.
SPARC, Alpha, and PA-RISC's target markets are high-performance workstations with a bias towards arithmetic intensity.
HP PA-RISC family focuses on multi-operation fused add and multiply instructions. HP PA-RISC workstation replaced HP 68K Unix workstation. Memory operand with arithmetic instruction complexity was traded for multi-operation fused add and multiply instructions. Newer HP PA-RISC can be configured for little-endian or big-endian modes; earlier PA-RISC are big-endian. HP PA-RISC was replaced by Intel Itanium and later Softbank Aarch64.
DEC Alpha focuses on high-clocked streamlined pipeline designs. DEC Alpha replaced DEC's MIP and 68K workstations. DEC Alpha can be configured for little or big-endian modes. DEC was gutted by Intel, Compaq, and Microsoft.
SUN SPARC replaced SUN's 68K workstations. SUN SPARCs are big-endian. Purchased by Oracle. SPARC has been cloned by other entities e.g. Russian state's MCST R2000, Cobham Limited (UK)'s LEON5, CCP state's Galaxy FT-1500 (FeiTeng).
For big-endian capable CPU product lines, HP, DEC, and SUN don't have engineering resources to design high-performance/high clock speed complex CPUs like Intel or AMD.
Quote:
ARM and PPC first used unified caches instead of separate instruction and data caches which are better for cache/memory performance. RISC load/store ISAs have much increased memory traffic for code and data with these memory access handicaps.
|
Living CPU families adapt over time.
ARM's major upgrade was with DEC's StrongARM, which was released in February 1996 with 100, 150, and 200 MHz clock speeds. Faster 166 and 233 MHz versions were announced on 12 September 1996. DEC's specialty with high-speed pipelining was applied to ARM ISA.
StrongARM's design wins among smart handhelds set vendors the template for ARM's dominance in the smart handheld market. Apple's iPhone has its origins in Apple's ARM/StrongARM-based PDAs. The same for ARM9T-based smart handheld competitors.
Quote:
The 68000 had a 16-bit data bus and non-multiplexed address and data busses. The 68020-68060 had a 32-bit data bus, non-multiplexed address and data busses and separate instruction and data caches. The 68k ISA minimized memory traffic at the same time. The embedded market understood the advantages but the rest of the computer market fell for the RISC hype despite the major memory bottleneck. RISC CPUs were often cheaper but 68k systems often remained competitive in performance and price.
|
For the 1994 and 1995 time window, you haven't produced a PS1-level console business plan with proper BOM costing with Motorola's 68xx040 or 68xx060.
Prove "RISC CPUs were often cheaper but 68k systems often remained competitive in performance and price".
For 3DO's business, IBM provided two PPC 602 with FP32 pipeline.
Each PPC 602 consumes about 1 million transistors, which includes fully pipelined FP32 and a 66 MHz clock speed. IBM is willing to customize PPC cores for 3DO's target budget. 3DO M2's dual CPU budget allocation is 2 million transistors with dual FP32 pipeline sets. IBM resources backed the game console PowerPC adventure until being beaten by AMD and NVIDIA's ARM. IBM didn't have a GPU bundle offer.
FP32 target matched the later PC gaming's vertex shading and hardware TnL FP32 datatype.
68060 consumes 2.1 million transistors with non-pipelined FPU.
Both Amiga Hombre baseline and Sony PlayStation 1 have about 1 million transistor budgets.
Xbox project had a massive PC market to subsidize game console CPU offers, and it's not Motorola.
For about 1.x million transistors budget-class X86 CPUs, both Intel and AMD ramped up 486 clock speeds beyond 66 Mhz e.g. 486DX4 100 Mhz for Intel and 5x86 130 to 160 Mhz for AMD.
Intel and AMD managed to maintain economies of scale with fat CPU designs, which is not the same for 68060!
68040 at 133 MHz for budget price is not in Motorola's DNA.
Last edited by Hammer on 23-Jul-2025 at 06:46 AM. Last edited by Hammer on 23-Jul-2025 at 04:50 AM. Last edited by Hammer on 23-Jul-2025 at 04:37 AM.
_________________
|
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jul-2025 6:16:10
| | [ #369 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @matthey
Quote:
From a technical perspective, he believes PPC could compete. So code density does not matter? No. Code density is the reason PPC failed. PPC may have had good enough code density back then for the desktop market but it did not have good enough code density for the embedded market. . |
For the XBO and PS4 game console embedded market, and when both competing CPU designs have two instruction issues per cycle, AMD Jaguar defeated IBM PowerPC A2 (PPE successor).
Code density can be a factor on IBM PowerPC A2+custom vs AMD Jaguar, but AMD's deal was superior due to the Pitcairn-class GCN GPU bundle.
Intel SSSE3 maintains SIMD density parity with IBM SPU e.g. https://whatcookie.github.io/posts/why-is-avx-512-useful-for-rpcs3/
The performance when targeting SSE2 is absolutely terrible, likely due to the lack of the pshufb instruction from SSSE3. pshufb is invaluable for emulating the shufb instruction
AMD Jaguar includes pshufb instruction from SSSE3. Packed byte vector instruction sets are important for games. If it's useful for games, PC will assimilate it.
3.2 Ghz 68060++++ effort with scalar version against SPU's packed byte vector 128-bit wouldn't be pretty i.e. 200 Mhz effective with 3.2 Ghz 68060++++.
X86 with SSE2 wouldn't be able to match SPU's shufb instruction.
Lisa Su obtained PlayStation 3's business for IBM and CELL had instruction set customizations for games and Intel wasn't asleep on the wheel since Intel's SSSE3 was released with Core 2 Woodcrest in June 2006.
Against AMD's Zen 2 compact (PS5) and full-size laptop Zen 2 (Xbox Series X), IBM didn't recover for Xbox Series family and PS5 / PS5 Pro bid contracts. The same for Xbox Next and PS6 bid contracts.
IBM lost major game console contracts to AMD's X86-64 v2+/v3 and ARMv8 (via NVIDIA; ARM has smart handheld momentum).
AMD's Zen 2 compact for PS5 was cost-reduced beyond the laptop Zen 2 variant (half L3 cache from full desktop Zen 2 variant).
X86 vendors maintained SIMD extensions designed for games since it's a major use case with home PCs and game PC servers, which is a business. Notice X86 vendors are treating the "toy business" with respect i.e. money is money.
Motorola didn't respect gamers. Herni Rubin didn't respect gamers. Administration would need to change for game-friendly 68060+ variant.
Thanks to Intel for providing cost-effective Celeron 300A, a legendary gamer CPU when overclocked to 450Mhz. Intel Haswell was a good gaming CPU.
It's good that Intel has bid for Sony's PS6 contract despite losing to AMD i.e. it's Intel not forgetting about gamers. Intel Arrow Lake was unfortunate, but Intel promised better gaming performance with Nova Lake and competition to AMD's Strix Halo or AMD's future big APUs.
_________________
|
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jul-2025 6:34:48
| | [ #370 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @kolla
Quote:
Why? There’s no unix software that would use it, and Amiga software that rely ib MUI doesn’t run natively on unix, so what’s the point? MUI can never work on unix, as ut relies on AmigaOS features such as shared memory between the programs, which is exactly the opposite of what’s typical for unix, where each program gets its own memory.
Besides that, the functionality of MUI already exists on UNIX, Qt, GTK and whatever Apple is doing, are examples of this.
|
On Linux/NT-type OS, Amiga's shared memory address space needs to be sandboxed for legacy Amiga apps. The user should have the option to spawn multiple Amiga sandboxes or a single Amiga sandbox.
PC's Win16 has a shared memory design, and it was VM-boxed.
Amiga's legacy is 32-bit instead of PC's 16-bit Win16 legacy.
AmigaOS 4.x PPC didn't reach AmigaOS NT'ed evolution stage.
_________________
|
| | Status: Offline |
| | minator
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jul-2025 17:26:58
| | [ #371 ] |
| |
 |
Super Member  |
Joined: 23-Mar-2004 Posts: 1046
From: Cambridge | | |
|
| @Hammer
Quote:
| FYI, For the 1986 released IBM RT PC, the IBM ROMP CPU family has 16-bit and 32-bit RISC instruction sets before Hitachi SuperH. IBM ROMP CPU has 16 32-bit GPRs. |
The Seymour Cray designed CDC 6600 had 2 instruction sizes (15 and 30 bit) way back in 1963. However, this was done in a very simple way so they are very simple and quick to decode. You'll often see this machine described as the first RISC machine: "Really Invented by Seymour Cray"._________________ Whyzzat? |
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jul-2025 2:12:29
| | [ #372 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @minator
Quote:
minator wrote: @Hammer
Quote:
| FYI, For the 1986 released IBM RT PC, the IBM ROMP CPU family has 16-bit and 32-bit RISC instruction sets before Hitachi SuperH. IBM ROMP CPU has 16 32-bit GPRs. |
The Seymour Cray designed CDC 6600 had 2 instruction sizes (15 and 30 bit) way back in 1963. However, this was done in a very simple way so they are very simple and quick to decode. You'll often see this machine described as the first RISC machine: "Really Invented by Seymour Cray".
|
1. That's not a 16-bit and 32-bit instruction set.
2. The mid-1980s RISC camp promoted a 1 instruction per clock cycle doctrine.
https://en.wikipedia.org/wiki/CDC_6000_series
Depending on instruction type, an instruction can take anywhere from five clock cycles for 18-bit integer arithmetic to as many as 68 clock cycles (60-bit population count)
_________________
|
| | Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jul-2025 2:40:25
| | [ #373 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2828
From: Kansas | | |
|
| minator Quote:
The Seymour Cray designed CDC 6600 had 2 instruction sizes (15 and 30 bit) way back in 1963. However, this was done in a very simple way so they are very simple and quick to decode. You'll often see this machine described as the first RISC machine: "Really Invented by Seymour Cray".
|
The CDC 6600 was very influential and generally a simpler design than most other CPUs at the time, I hesitate to call it RISC. It is a strange architecture with 15-bit and 30-bit instructions, 60-bit memory accesses, 18-bit addresses, dedicated but partially orthogonal registers such as Xn, An, and Bn and if I understand correctly not really even being load/store as a write to an An register triggers load/store accesses.
https://en.wikipedia.org/wiki/Talk:CDC_6600#Instruction-set_architecture
In support of your argument that the CDC is early RISC, David Patterson mentions the CDC 7600 in the errata of his famous "The Case for the Reduced Instruction Set Computer" paper as an example of a simple computer but he never calls it RISC. The IBM 801 is used as the best example of RISC.
The Case for the Reduced Instruction Set Computer https://www.cs.utexas.edu/~fussell/courses/cs352h/papers/risc.pdf Quote:
At IBM. Undoubtedly the best example of RISC is the 801 Minicomputer, developed by IBM research at Yorktown Heights, N.Y. This project is several years old and has had a large design team exploring the use of the RISC architecture in combination with very advanced compiler technology. Though many details are lacking their early results seem quite extraordinary. They are able to benchmark programs in a subset of PL/I that runs about five times the performance of an IBM S/370 model 168. We are certainly looking forward to more detailed information.
|
John Cocke is mentioned earlier in the paper as well.
https://en.wikipedia.org/wiki/John_Cocke_(computer_scientist) Quote:
John Cocke (May 30, 1925 – July 16, 2002) was an American computer scientist at IBM and recognized for his large contribution to computer architecture and optimizing compiler design. He is considered by many to be "the father of RISC architecture."
|
John Cocke led the IBM 801 team which resulted in the more practical IBM ROMP MPU with 16-bit and 32-bit sizes. The IBM RS/6000, POWER and PPC are based off the development of his team as opposed to many other RISC ISAs like MIPS and SPARC having more influence from David Patterson's academic research. Much of IBM's early RISC research was published so David's work is influenced by it. While David acknowledges "code compaction is important", he deemphesizes the importance of it. IBM research and the choice to create the ROMP MPU instead of simply producing an 801 MPU version of the IBM 801 mini computer CPU demonstrates a different opinion of the importance of code density from IBM researchers.
IBM RT Personal Computer Technology https://bitsavers.org/pdf/ibm/pc/rt/SA23-1057_IBM_RT_Personal_Computer_Technology_1986.pdf Quote:
ROMP /801 Differences
Although the 801 and ROMP have a common heritage, some important differences exist between the two. The 801 assumed the use of two cache memories, one for instructions and one for data. A requirement for caches was not incorporated into the ROMP design for cost and complexity reasons. Since the ROMP can execute an instruction almost every processor cycle, an efficient memory interface capable of high bandwidth was a requirement. Two key features of the ROMP design which greatly reduce memory bandwidth limitations are: the Instruction Prefetch Buffer and the use of 16-bit, in addition to 32-bit, instructions. The ROMP contains a 16-byte instruction pretetcn buffer which practically guarantees that all sequentially accessed instructions are available for execution when they are needed.
The 801 migrated to all 32-bit instructions while the ROMP maintained both 16- and 32-bit instructions. The judicious use of 16-bit instructions decreases memory code space and allows more code per real-page frame in a virtual memory system, resulting in fewer page faults and improved system performance. More importantly, the shorter average instruction length of the ROMP decreases the memory bandwidth required for instruction fetches. For example, an instruction mix containing 30% Load and Store instructions (which require 32 bits of memory reference each for data) would require 41.6 bits of memory bandwidth per instruction if all instructions are 32 bits long. The same instruction mix executed in the ROMP, where the average instruction length (weighted average of 16- and 32-bit instructions) is about 20 bits (2.5 bytes), only requires an average of 29.6 bits for each instruction. This is a reduction in memory bandwidth requirement of almost 30% per instruction for the ROMP over a design which contains only 32-bit instructions. Since memory bandwidth is usually the performance-limiting factor, a 30% reduction in the bandwidth requirement will certainly improve performance in a non-cache system.
It must be recognized that a machine with all 32-bit instructions should do more "work" for each instruction executed than a machine with some instructions that can only be executed in a 16-bit format. That is, an equivalent MIP (Million Instructions Processed per second) rate for a machine with only 32-bit instructions should represent more processing capability than the same MIP rate for a machine with both 16- and 32-bit instructions. One of the limitations of 16-bit instructions is the limited number of bits available to specify operation codes, registers, displacements, etc. This limitation is one of the reasons that the 801 uses 32-bit instructions exclusively. Use of only 32-bit instructions permits the register specification fields to contain the 5 bits required to select one of 32 general-purpose registers (GPRs). The limit of 16 registers for the ROMP results in only a modest increase in Load and Store frequency, since the PL.8 compiler perform', an efficient register optimization. A primary motivation for having 32 registers is efficient emulation of other architectures which have 16 general-purpose registers (Le., System/370). The ROMP does an excellent job of emulating other machines which have a more limited register set. The 801 is significantly better at 370 emulation. Aside from emulation, the use of all 32-bit instructions is estimated to make the 801 MIP rate about 15% to 20% more powerful than the ROMP MIP rate. That is, software path lengths for 801 programs are about 15% to 20% shorter than they are for equivalent ROMP programs.
The use of both 16- and 32-bit instructions adds some design complexity. Instruction handling and decoding must account for instruction location on both 16- and 32-bit boundaries. The 16-byte Instruction Buffer and its management also adds complexity. However, studies have shown that the 16-byte Instruction Buffer provides about the same performance advantage as a 256-byte instruction cache, with a significant savings in the silicon required for implementation.
The design point chosen for the ROMP is well suited for a microprocessor VLSI design. Good performance is achieved with readily available memories and the silicon area requirements are a good fit for our SGP technology. The ROMP's dual 16- and 32-bit instruction format provides about a 10% net performance advantage over an equivalent 801 microprocessor in non-cache systems.
Compiler Development for ROMP & 801
The PL.8 compiler was initially developed for the 801 project in Research as part of the exploration of the interaction of computer architecture, system design, programming language, and compiler techniques. The adaptation of this compiler to the ROMP architecture was done in Austin. A single compiler was maintained with the addition of another "backend" for the ROMP. This involved a complex working relationship between Research and Austin. This excellent relationship has continued over the years with enhancements and modifications being made by both groups. The compiler is currently owned by Austin with enhancements being made by both groups.
The PL.8 compiler currently supports three source languages, Pascal, C, and PL.8, a PL!I variant designed to be suitable for generation of efficient object code for systems programming. Object code is produced for the 801, ROMP, System!370, and MC68000.
The ROMP PL.8 compiler development influenced the design of the ROMP instructions in a number of significant ways. The goal of program storage (byte) efficiency caused the following modifications to be made:
1. Short (16 bits) forms of several instructions were introduced to provide for the special case of an immediate operand with value less than 16. For example, Add Immediate, Subtract Immediate, Compare Immediate, and Load Immediate were provided.
2. A short-form relative jump instruction was added with maximum displacement of plus or minus 256 bytes.
3. The long (32-bit) Branch instructions were defined to be relative rather than absolute in order to reduce the storage necessary for relocation information from modules.
4. A Load Character instruction was added in order to handle character data with fewer bytes.
In addition, Load Multiple and Store Multiple instructions were provided to improve the speed of subroutine linkage. The resultant ROMP architecture proved to require about 30% fewer bytes than B01 for a selected set of bench marks.
...
In addition the PL.B compiler uses LALR parser generator techniques. Syntax-directed translation enables the compiler to associate the intermediate code generation directly with the syntactic structure of the source language. Furthermore, it uses a map-coloring algorithm from topology for register allocation. Most programs of reasonable size color in 16 GPR without spilling. 32 GPR would reduce spilling on larger programs but would require 5 bits for register specification which would require 32-bit instructions. The trade-off was made in favor of the use of 16-bit instructions (with the 25% to 30% performance advantage) at the performance detriment of large programs,
|
IBM's research and conclusions are good. The ROMP RISC ISA made better choices than many later RISC ISAs. It has 16 GP registers without wasting any on the PC or a zero register, it supports 16-bit and 32-bit encodings from inception for much better code density and it supports 8-bit, 16-bit and 32-bit integer datatypes. It has advantages here over the original ARM, SuperH, Thumb and MIPS16 compressed RISC ISAs and it was earlier giving another advantage. There were mistakes though too.
1. instruction buffer - RISC mentality to simplify away the caches did not scale like a cache and it was better to use the RISC transistor savings for caches like David Patterson suggested; also, the researchers did not identify that the code density savings applied to caches and the transistor savings from improved code density grows with the size of caches which David also failed to realize thus the need for RISC-V do over #4
2. no integrated FPU plan - RISC simplification philosophy leaves the FPU off chip where a NS32081 CISC FPU was used which includes mem-mem, mem-reg, reg-mem operations and immediates/displacements are encoded in big endian with 68k like size scaling while memory accesses are little endian on the NS32k at least
3. branch delay slot - does not scale with deeper pipelines and branch prediction but common RISC mistake
4. virtualization overhead - ROMP has workstation/server like virtualization support increasing costs for the embedded and desktop markets where the original IBM 801 did not
5. no code metric comparison with the 68k - their high tech compiler supported the 68000 where a comparison with 68k code should have showed that 16 GP registers can support code with a minimal increase in number of instructions executed (the common RISC compression execution path increase problem) instead of the "15% to 20% shorter" paths with just fixed length 32-bit instruction encodings which seems high and likely could have been improved with better code analysis and tuning of the ISA immediates and displacements; full scaling of immdedates and displacements inside instructions like the 68k is also possible even for load/store architectures as the BA2 ISA demonstrates even though a 16-bit variable length encoding like the 68k is likely more efficient than the BA2 8-bit variable length byte encoding
The 1st through 3rd issues above are not catastrophic. The NS FPU could have been replaced. Caches likely could have been added but the cache savings with the instruction buffer was a selling point that management may not have liked when compared to other ISAs with increasing caches. A similar marketing problem may have been true with the RISC hype in regard to the variable length instruction encodings. The RISC philosophy was to be simple and RISC stands for "Reduced Instruction Set Computer" where ROMP increased the number of instructions to 118 far beyond the 68000 56 instructions and most other fixed 32-bit RISC ISAs. It is the scaled immediates/dispelacments and powerful addressing modes with orthogonality that makes the 68k have more of a reduced instruction set than most so called RISC ISAs today.
One last funny comment in the ROMP documentation actually makes a good case for using a CISC FPU.
IBM RT Personal Computer Technology https://bitsavers.org/pdf/ibm/pc/rt/SA23-1057_IBM_RT_Personal_Computer_Technology_1986.pdf Quote:
Several relationships among these execution times are of interest. An ADD or MUL Single memory-to-memory is only 0.4 IlS slower than the register-to-register version of the commands. An ADD or MUL Double memory-to-memory is only 1.0 IlS slower than the register-to-register versions. The time to load one single precision operand into an NS32081 register is 2.6 IlS, or 2.2 IlS slower than executing an ADD or MUL using the same operand memory-to-memory versus register-to-register. The time to load a register double is 1.8 IlS slower than executing an ADD or MUL memory-to-memory versus register-to-register. The times to move an operand to memory are slightly slower still.
While adds and multiplies are the most often used floating point arithmetic operations, loads and stores occur even more often. In particular, at least one non-register-resident operand is needed for most floating point operations. Computations commonly occurring in engineering and scientific problems such as matrix inversion, dot product evaluation, and polynomial evaluation seem to have this characteristic. Fast load and store commands are one key to the performance of a floating point accelerator.
|
A single single precision operation of two fp numbers in memory has better performance than loading one into a register and then performing the operation. Also, at least one fp number is usually in memory when performing an operation. This just made the case for a CISC FPU and a CISC FPU does not have as much of a performance advantage over a RISC FPU as a CISC integer units over RISC integer units. In either case, RISC requires more GP registers to avoid stalls and register spills are much more expensive as are memory accesses in general. A RISC pipeline may have a small advantage with reg-reg operations but a CISC pipeline has a large advantage when accessing memory/caches, even without good code density like x86-64.
Last edited by matthey on 24-Jul-2025 at 02:53 AM. Last edited by matthey on 24-Jul-2025 at 02:51 AM.
|
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jul-2025 3:18:51
| | [ #374 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @matthey
https://old.chipsandcheese.com/2025/01/30/a-risc-v-progress-check-benchmarking-p550-and-c910/ Benchmarking RISC-V P550 and C910 against others.
SPEC CPU2017 Estimated scores Intel Celeron J4125 (Goldmont Plus) 2.7 Ghz Integer score: 2.44 Floating Point: 2.49 525.x264 ratio: 4.87
Amlogic S922X(ARM Cortex A73), 2.2 Ghz Integer score: 1.84 Floating Point: 2.02 525.x264 ratio: 3.85
MediaTek Genio 1200 (ARM Cortex A55), 2Ghz Integer score: 1.19 Floating Point: 1.01 525.x264: not shown.
Eswin EIC7700X(SiFive P550), 1.4 Ghz Integer score: 1.17 Floating Point:1.29 525.x264 ratio: 1.46
T-Head TH1520 (C190), 1.85 Ghz Integer score: 0.99 Floating Point: 1.29 525.x264 ratio: 1.12
Amlogic S922X(ARM Cortex A53), 1.9 Ghz Integer score: 0.85 Floating Point: 0.66 525.x264: not shown.
SPEC CPU 2017 Integer Workload IPC https://old.chipsandcheese.com/2025/01/30/a-risc-v-progress-check-benchmarking-p550-and-c910/p550_c910_specint_ipc/
Eswin's SiFive 550 shows good IPC, but SPEC scores didn't reflect that.
------------------------------ x264 Encode 1280x720 Video, preset=veryslow
AMD K10 Athlon II 4X 651 quad core Stars: 5 fps, Intel Celeron J4125 (quad core Goldmont Plus): 4.29 fps, Amlogic S922X, quad core A73: 2.88, Mediatek Genio 1200, quad core A55: 1.56 fps, Eswin EIC7700X(quad core SiFive P550): 0.43 fps with scalar. T-Head TH1520 (quad core C190): 0.31 fps with scalar.
x264 Encode 1280x720 Video IPC Eswin EIC7700X(quad core SiFive P550): 2.53 T-Head TH1520 (quad core C190): 1.38 AMD K10 Athlon II 4X 651 quad core Stars: 1.14 Intel Celeron J4125 (quad core Goldmont Plus): 1.01 Amlogic S922X, quad core A73: 0.78 Mediatek Genio 1200, quad core A55: 0.55
x264 Encode, Executed instructions (instructions retired in trillions) Eswin EIC7700X(quad core SiFive P550): 41.64 T-Head TH1520 (quad core C190): 41.62 AMD K10 Athlon II 4X 651 quad core Stars: 3.23 Intel Celeron J4125 (quad core Goldmont Plus): 2.93 Mediatek Genio 1200, quad core A55: 2.88 Amlogic S922X, quad core A73: 2.87
But IPC in isolation is misleading. Clock speed is an obvious factor. Instruction counts are another. In x264, the two RISC-V cores have to execute so many more instructions to get the same work done that IPC becomes meaningless.
X86-64s and ARM A55/A73 have the benefit of quality multimedia SIMDs with software codebase to match.
Good quality multimedia instructions can reduce the number of Instruction counts to get the work done.
Why x264 use case? It's for social media video revolution i.e. the modern multimedia use case.
I wonder who departed from Amiga's multimedia target audience and jumped into pure embedded microcontroller target?
The Amiga chipset is the value-added multimedia extension for 68000.
68060 would need a lot of work to get it modern enough for embedded multimedia, let alone Amiga being reborn as a Nintendo Switch hardware competitor.
---------------------------
For the recently optimized OutRun AGA port i.e. https://www.youtube.com/watch?v=WZzTp3vSC0g The minimum CPU is 68030 @ 50MHz, but it wants a 680LC040-class CPU.
From Commodore - The Final Years book for 1991 era system integration phase for 68EC040 @ 25Mhz with BOM costing estimates
The engineers wanted to allow the then-new Motorola 68040 processor to work with the next generation of Amigas. And of course, the architecture would work with the upcoming AAA chipset, as well as the more imminent AA. And because AAA was designed to work with different processor families, Haynie wanted his Acutiator motherboard to also handle different processors. Specifically, there were at least three major RISC processor families at that time and he wanted Acutiator to accept these RISC chips.
Their architecture required three custom chips: EPIC, AMOS, and SAIL. In cost comparisons, Haynie calculated that the Acutiator architecture would add approximately $125 to a system (including the cost of a 68EC040 chip), resulting in a $300 retail price increase. This was a bargain, considering the user received a significant processor upgrade. Haynie proposed that Commodore should assign Scott Shaeffer, Paul Lassa, and himself to each create the three required gate array chips. He expected prototypes in 7 to 9 months, with the first systems shipping in 1992.
For Xmas Q4 1992 sales target, Acutiator chipset + 68EC040 @ 25MHz has $125 BOM cost. Add this BOM cost to a potential CD32-type game console.
For Xmas Q4 1995 sales target, Acutiator chipset + 68EC060 @ 50 Mhz. 68EC060's cost is higher than 1991 era 68EC040-25's cost.
That's a Motorola fat 68K CPU-centric play targeting 3DO's US$699 price range.
https://www.techmonitor.ai/technology/motorola_plans_to_sample_the_68060_next_quarter
There are also cheaper 68LC060 and 68EC060 variants of the new part, which omit the memory manager, and both the memory manager and the floating point unit; they cost $169 and $150 respectively for 10,000-up
For 1992, 68EC040-25's wholesale cost is similar to AMD's Am386-40's wholesale cost.
For integrated MMUs, Motorola positions premium prices for these SKUs since it follows AT&T's expensive Unix license prices.
For 1995 era, DEC StrongARM's 100MHz or Intel 486DX4-100 or AMD's 5x86's 100MHz high clock speed budget build, high clock speed 68EC040 @ 100 MHz may be enough, but Motorola hasn't done that. Freescale/NXP has offered 68040V @ 66Mhz in modern times. LOL.
https://archive.computerhistory.org/resources/access/text/2013/04/102723262-05-01-acc.pdf DataQuest 1991, page 119 of 981
For 1992, 68EC020-16, $16.06 68EC020-25, $19.99 68020-25, $35.13 68EC030-25, $35.94 (missing MMU, not Unix capable, used in A4000/030) 68030-25, $108.75 68040-25, $418.52 68EC040-25, $112.50 (missing MMU and FPU, Commodore management rejected Acutiator glue chips for Amiga)
AM386-40, $102.50 R3000-25, $96.31 (R3050 with embedded MMU is cheaper).
https://www.electronicproducts.com/mips-processors-to-push-performance-and-price/ From 1992, IDT MIPS R3040 @ 20 Mhz has $15 price, that's 68LC040 1 IPC class with a budget price.
PlayStation 1's LSI Logic R3050 selection is a no-brainer.
Last edited by Hammer on 24-Jul-2025 at 03:26 AM.
_________________
|
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jul-2025 3:42:08
| | [ #375 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @matthey
Quote:
One last funny comment in the ROMP documentation actually makes a good case for using a CISC FPU.
(skip)
A single single precision operation of two fp numbers in memory has better performance than loading one into a register and then performing the operation. Also, at least one fp number is usually in memory when performing an operation. This just made the case for a CISC FPU and a CISC FPU does not have as much of a performance advantage over a RISC FPU as a CISC integer units over RISC integer units. In either case, RISC requires more GP registers to avoid stalls and register spills are much more expensive as are memory accesses in general. A RISC pipeline may have a small advantage with reg-reg operations but a CISC pipeline has a large advantage when accessing memory/caches, even without good code density like x86-64.
|
CISC = fused implied data load with arithmetic operation. RISC = explicit data load operation, separate arithmetic operation.
Amiga Hombre PA-RISC argument is to trade fused memory operand (implied data load) with scalar arithmetic operation complexity for SIMD with RISC's explicit load operation.
PA-RISC argument is for multiple arithmetic operations with 1 instruction and a smaller CPU core.
Amiga Hombre's two ASICs have a 1 million transistor budget.
Sony PlayStation 1 also has a 1 million transistor budget. LSI Logic and Toshiba squeeze in MIPS R3050+GTE and Toshiba's GPU.
3DO M2 has a 2 million transistor budget for the two PPC 602 CPUs backed by IBM.
X86 vendors wouldn't revisit the game console market until Xbox's system integration phase, which targets MIPS's price vs performance points. IBM (via PPC G3 with custom 64-bit SIMD) would also make its move against MIPS i.e. N64's MIPS was dumped for IBM's customized 64-bit SIMD-equipped PPC G3. MIPS was struggling with high clock speed in mass production.
Motorola's fat CPUs wouldn't be participating in game consoles after the 16-bit generation.
Motorola wouldn't be modifying 68030 or 68040 with "multimedia" SIMD. Such as a combo CPU+SIMD would threaten their discrete DSP business. Apple would need to drag Motorola into Altivec SIMD. AMD has no problems bundling IGP/DSP/NPU with CPUs since discrete GPUs are larger-scale.
Freescale/NXP still treats Altivec as a premium option between PPC e5500 vs PPC e6500.
Intel vs AMD's "many cores" competition with AVX-512 has effectively resurrected the Larrabee project.
Last edited by Hammer on 24-Jul-2025 at 04:14 AM. Last edited by Hammer on 24-Jul-2025 at 04:10 AM. Last edited by Hammer on 24-Jul-2025 at 04:07 AM. Last edited by Hammer on 24-Jul-2025 at 04:05 AM. Last edited by Hammer on 24-Jul-2025 at 04:02 AM. Last edited by Hammer on 24-Jul-2025 at 03:58 AM. Last edited by Hammer on 24-Jul-2025 at 03:49 AM. Last edited by Hammer on 24-Jul-2025 at 03:44 AM.
_________________
|
| | Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 25-Jul-2025 1:11:06
| | [ #376 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2828
From: Kansas | | |
|
| Hammer Quote:
Eswin's SiFive 550 shows good IPC, but SPEC scores didn't reflect that.
|
Most compressed RISC ISAs have to execute more instructions and RISC-V compression is no exception. The RISC-V philosophy to remain one of the simplest RISC ISAs continues to give a handicap.

The only reference to another compressed RISC encoding is Thumb-2 which uses ~8% fewer integer instructions but Thumb-2 has elevated instructions executed too of 10% to 30% depending on the comparison architecture with the original ARM ISA closer to 10% and the 30% being closer to the AArch64 ISA which is used by the competition here. With optimal code, RISC-V cores using compressed ISAs may have to execute 30% to 40% more instructions to keep up with AArch64 cores at same frequency. SuperH, Thumb and MIPS16 may have to execute 50% more instructions to keep up. AArch64 is very good at minimizing the number of instructions to execute by using a large instead of Reduced Instruction Set Computer (RISC) ISA with powerful addressing modes like the 68k. In comparison to the 68k, a 68k core using the compressed 68k ISA only has to execute ~5% more instructions to keep up with AArch64 and proposed ISA enhancements would reduce this small disadvantage. The 68k as the reduced instruction set compared to AArch64 which remains competitive with a smaller instruction set do to scaled immediates/displacements as part of the instruction instead of being broken up and mem-reg, reg-mem and mem-mem memory/cache accesses. Most old 32-bit fixed length RISC ISAs can not keep up with AArch64 either as a PPC core needs to execute ~20% more instructions and MIPS and SPARC cores are worse. PPC was more complex with more powerful instructions and addressing modes than most of the other fat RISC ISAs like MIPS, SPARC, ARM, Alpha and PA-RISC. AArch64 is also better code density than these fat ISAs even though it is mediocre. ARM did a very good job of optimizing the performance of AArch64 with the disadvantages being limited code density and large cores.
The x86 ISA instruction counts were higher than the RISC-V compressed instruction counts. I observed the same issue in Vince Weaver's performance metric comparison which is do to only having 6 GP registers, which also greatly increases the memory traffic. This is a severe handicap too even for CISC which does not need as many GP registers as RISC (the 68k 16 GP registers is competitive with 32-bit fixed length encoded RISC ISAs with 32 GP registers in instruction count and memory traffic performance metrics). So how did x86 with only 6 GP registers not only survive but end up winning the CISC vs RISC war against RISC ISAs with 32 GP registers?
RISC versus CISC: A Tale of Two Chip https://dl.acm.org/doi/pdf/10.1145/250015.250016 Quote:
The Pentium Pro @150MHz vs the Alpha 21164@300MHz still had the Alpha in the lead in performance by a small margin for integer performance and a strong lead for FP benchmarks by extreme RISC simplification of the core and ISA which allowed to double the clock speed compared to the P6 Pentium. The first pic above shows the data traffic handicap do to too few GP registers but it is not nearly as severe as the L1 instruction cache miss rate of the Alpha CPU in the 2nd pic. They both have an 8kiB instruction cache although the Alpha cache is direct mapped to reduce access time at the high clock speed vs the 4-way set associative P6 cache. The Alpha 21164 CPU has a severe instruction supply bottleneck from its 8kiB L1 instruction cache and enlarging it also increases the access time. The severe cache miss problem was solved with innovation as the Alpha 21164 introduced a 96kiB L2 cache. However, the increased transistor count from on-chip L2 caches caused the transistor count of the chip to increase to 9.3 million transistors vs 5.5 million for the P6. The Alpha 21164 dissipated 50W vs 20W for the P6. The lower clocked P6 was more practical with fewer transistors allowing a cheaper CPU that dissipates less heat. The deep 14-stage pipeline allowed the P6 to be clocked up to 200MHz and later cores based on it much further as most Intel x86-64 today are based on it. The x86 ISA was vulnerable do to only 6 GP registers as architect Bob Colwell wrote earlier but many RISC architects/developers were philosophical extremists that would not accept that their code density handicap was killing their ISAs and core competitiveness. Motorola had the x86 killer ISA in the 68k and developed the Pentium killer 68060 but then tossed it all for RISC hype and PPC.
The article you just brought up makes note of in-order cores competing with OoO cores.
https://old.chipsandcheese.com/2025/01/30/a-risc-v-progress-check-benchmarking-p550-and-c910/ Quote:

...
Cortex A55 and A53 provide perspective on where in-order execution sits today. Neither core can get anywhere close to high performance client designs, but C910 and P550 have relatively small out-of-order engines. They also run at low clock speeds. Mediatek’s Genio 1200 has a particularly strong A55 implementation, with higher clock speeds and better DRAM latency than C910 and P550. Its Cortex A55 cores are able to catch C910 and P550 without full out-of-order execution.
This isn’t the first time an in-order core does surprisingly well against out-of-order ones. Back in 1996, AMD’s K5 featured 4-wide out-of-order execution and better per-clock performance than Intel’s 2-wide, in-order Pentium. Intel clocked the Pentium more than 30% faster, and came out top. Today’s situation with C910 and P550 against A55 has some parallels. A55 doesn’t win everywhere though. It loses to both RISC-V cores in SPEC CPU2017’s floating point suite. And a less capable in-order core like A53 can’t keep up despite running at higher clocks.
|
The simpler and cheaper but higher clocked in-order Cortex-A55 is outperforming the OoO RISC-V cores. Even if the C910 was clocked up to the same frequency as the Cortex-A55, the performance difference of the more complex OoO RISC-V core is not worth it. The OoO SiFive P550 has better than in-order core performance, good IPC and potential but is clearly handicapped by having to execute too many instructions. RISC-V cores support 32-bit instructions without using the compression and optimizing for performance should already use 32-bit instructions at the expense of code density saving 16-bit instructions. I believe the simplistic non-compressed RISC-V ISA has a significant instructions executed disadvantage compared to AArch64 and much of the code density advantage promoted by RISC-V studies when optimizing for size disappears when optimizing for performance. Many ISAs have this problem including x86(-64) where optimizing for size has a good code density with increased instructions executed and memory traffic which optimizing for performance reduces at the cost of worse code density. This is generally not a problem for the 68k as all 16 GP registers can be used by 16-bit instructions. The 68k has decreased instructions executed, decreased memory traffic and very good code density mostly without this compromise like most other compressed ISAs.
Last edited by matthey on 26-Jul-2025 at 02:20 PM.
|
| | Status: Offline |
| | kolla
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 26-Jul-2025 15:59:47
| | [ #377 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3559
From: Trondheim, Norway | | |
|
| @ppcamiga1
Quote:
ppcamiga1 wrote: @kolla
stop trolling start working on mui port it to unix
|
A friend of mine and some other of his buddies did, they named it Qt. In the early days my A1200 was a test system for it, running NetBSD, aka unix.Last edited by kolla on 26-Jul-2025 at 04:00 PM.
_________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| | Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 27-Jul-2025 2:07:03
| | [ #378 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6704
From: Australia | | |
|
| @matthey
Quote:
The simpler and cheaper but higher clocked in-order Cortex-A55 is outperforming the OoO RISC-V cores. Even if the C910 was clocked up to the same frequency as the Cortex-A55, the performance difference of the more complex OoO RISC-V core is not worth it
|
For embedded microcontroller workloads, the cheapest RISC-V implementation is good enough, hence, many single-purpose embedded vendors are using RISC-V e.g., mass storage devices.
I have more Nvidia's custom RISC-V cores in my gaming PC when compared to x86-64 v4 cores.
https://www.techpowerup.com/328026/nvidia-ships-over-one-billion-risc-v-cores-this-year-inside-its-accelerators-up-to-40-cores-per-chip
Each NVIDIA chip includes between 10 and 40 RISC-V cores, depending on the chip size and complexity. Some more complex designs, like GB200, require massive data coordination, meaning that more cores are needed to handle these requests and distribute them. This includes chip-to-chip interfaces, context switching, memory controller, camera handling, video codecs, display output, resource management, power management, and more
I have RTX 4090, RTX 4080 and RTX 5070 Ti. Blender 3D is dominated by RTX 4090's custom RISC-V.
Spec CPU2017 is not targeted for embedded microcontrollers.
Spec CPU2017 breakdown https://i0.wp.com/old.chipsandcheese.com/wp-content/uploads/2025/01/p550_c910_specint.png?w=786&ssl=1 x264 workload, large lead for ARM Cortex A73 and Intel Goldmont
Video NLE, 3D, and raytracing are heavy compute workloads, not a non-ML Photoshop or word processing workloads. Quote:
From https://dl.acm.org/doi/pdf/10.1145/250015.250016
On the SPEC92 suite, the RISC system is 16% to 53% faster than the CISC system on the integer benchmarks, with a 39% higher SPECint92 rating. On the floating point benchmarks, the RISC system is 72% to 261% faster, with a 133% higher SPECfp92 rating. On the SPEC95 suite, the Alpha 21164 is 5% to 68% faster on the integer benchmarks with a 22% higher SPECint95 rating; and 53% to 200% faster on FP benchmarks with a 128% higher SPECfp95 rating.
The Alpha 21164 or EV5 became available in 1995 at processor frequencies of up to 333 MHz. In July 1996, the 21164 line was ramped to 500 MHz.
HP PA-8000
In March 1995, Intel released the Pentium 120 MHz. In June 1995, Intel released the Pentium 133 MHz. In November 1995, Intel released Pentium Pro 180 MHz and 200 MHz. HP PA-8000 (with PA-RISC 2.0 including multimedia 64-bit SIMD) was also released.
In January 1996, Pentium 150 MHz and 166 MHz were released. In June 1996, Pentium 200 MHz was released. In January 1997, Pentium MMX 166 MHz and 200 MHz were released. In June 1997, Pentium MMX 233 Mhz was released.
With PC clones, Intel has an advantage in volume production with fat Pentium SKUs.
For the 1995 release, your argument doesn't address the $50 BOM cost range for the Amiga Hombre's CD3D game console and A1200 replacement.
After comparing A3000's Phase 5 Cyberstorm 68060 @ 50Mhz and Cybergraphics 64 (S3 Trio 64V), my family purchased a Pentium 150 (overclocked to 166)/S3 Trio 64UV PC clone for Xmas Q4 1996. Quake, Duke Nukem 3D, and Tomb Raider are the main considerations.
My Quake estimate for 68060 @ 50MHz was correct when ClickBoom released the Amiga Quake port. My Pentium 166 PC clone murdered the Phase 5 upgraded A3000. Recent Warp1260 wouldn't change this view.
Last edited by Hammer on 27-Jul-2025 at 08:28 AM.
_________________
|
| | Status: Offline |
| | ppcamiga1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 27-Jul-2025 9:32:19
| | [ #379 ] |
| |
 |
Super Member  |
Joined: 23-Aug-2015 Posts: 1145
From: Unknown | | |
|
| @kolla
no it was not mui stop trolling and start working on mui
|
| | Status: Offline |
| | pixie
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 27-Jul-2025 10:19:34
| | [ #380 ] |
| |
 |
Elite Member  |
Joined: 10-Mar-2003 Posts: 3539
From: Figueira da Foz - Portugal | | |
|
| | | Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|