Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6447 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

m88bet88cf: 1 hr 10 mins ago

app23win: 1 hr 11 mins ago

Socolivee: 1 hr 22 mins ago

amadive: 1 hr 24 mins ago

Mayarivers: 1 hr 33 mins ago

m88bet88tm: 1 hr 35 mins ago

Masonryder: 1 hr 42 mins ago

uu88aecom1: 2 hrs 59 mins ago

nhacaiuytinsenet: 5 hrs 3 mins ago

go8zcom3: 5 hrs 3 mins ago

Forum Index

General Technology (No Console Threads)

The (Microprocessors) Code Density Hangout

Poster

Thread

kolla

Re: The (Microprocessors) Code Density Hangout
Posted on 24-Aug-2025 21:44:24

[ #401 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3527
From: Trondheim, Norway

Suppose this thread is as good as any to make people aware...

The fundraiser to bring AC68080 out of its "EC" state and bring on a "full" AC68000 with MMU, was met within just a few days. Interesting times ahead, though I suspect it is much too late to really draw the attention of "the right stuff", and Gunnar has of course been doing himself the disservice by running around and insisting that "nothing and no-one needs MMU", just like he did with the "nothing and no-one needs FPU" earlier.

https://www.gofundme.com/f/memory-management-unit-mmu-for-apollo-v4-series

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Hammer

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 1:42:29

[ #402 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6658
From: Australia

@kolla

Quote:

kolla wrote:
Suppose this thread is as good as any to make people aware...

The fundraiser to bring AC68080 out of its "EC" state and bring on a "full" AC68000 with MMU, was met within just a few days. Interesting times ahead, though I suspect it is much too late to really draw the attention of "the right stuff", and Gunnar has of course been doing himself the disservice by running around and insisting that "nothing and no-one needs MMU", just like he did with the "nothing and no-one needs FPU" earlier.

https://www.gofundme.com/f/memory-management-unit-mmu-for-apollo-v4-series

AC68080 has increased general-purpose performance, and SAGA comes with RTG's chunky graphics. Why not use Linux 68K on it?

There's Vamos, which allows AmigaOS command-line software to run on Linux. There's potential with this idea i.e. NT'ed AmigaOS.

Last edited by Hammer on 25-Aug-2025 at 01:43 AM.

_________________

Status: Offline

bhabbott

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 2:25:39

[ #403 ]

Cult Member

Joined: 6-Jun-2018
Posts: 578
From: Aotearoa

@kolla

Gunnar won't even put Apollo Shield into V2, so Vampire is now dead to me.

Status: Offline

Hammer

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 2:26:29

[ #404 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6658
From: Australia

@matthey

Quote:
For a 16-bit base VLE, the other common option is tiered registers with the lower 3b reg encoding of 16-bit instructions accessing the first 8 registers and 32-bit encodings using 4b reg encodings. This initially appears cleaner as the visible Dn/An register split can disappear but commonly used special registers like the SP need to be mapped to the low 8 registers which does not look as clean, leaves fewer registers available and I do not believe the code density is as good. Some of these ISAs create more instructions like PUSH and POP in the case of the SP but this reduces orthogonality. The x86-64 ISA is similar but poorly implemented as a couple of the lower 8 registers are not GP despite having PUSH and POP instructions leaving only 6 GP registers before needing a prefix to access 8 more GP registers. Fortunately for x86-64, CISC cores with mem-reg and reg-mem memory accesses usually do not need as many registers and load-to-use stalls are avoided, one of the keys why x86 with 6 GP registers stayed ahead in performance of fat RISC with 32 GP registers like Alpha, MIPS, PPC, etc. Fat RISC code density was another reason. RISC fanatics are slow learners but the failures eventually disappeared leaving more competitive RISC architectures.

Unlike 68060 vs 68LC060, Pentium guaranteed X87 registers for baseline PCs, which is enforced by major AAA PC games such as Tomb Raider and Quake.

PC DOS Tomb Raider running poorly on x87-less AMD 486SX2-66 (overlocked to 80 Mhz) with 3DFX Voodoo https://www.youtube.com/watch?v=XHqLYzqZciM
Various high clock speed 486 models are compared with PC DOS Tomb Raider.

IA-32's 8 GPR limitation evolved the X86 implementations to have fast data transfers with X87 and XMM registers.

Explicit ALU operation to be linked with data in memory instructions has lessened the demand on the inferior GPR count.

Actual microarchitecture implementation is equally important e.g. AMD K5 or Cyrix 6x86 have a stronger integer implementation path, while Intel Pentium has a stronger X87 implementation path.

https://barefeats.com/doom3.html

MAC GAME PERFORMANCE BRIEFING FROM THE DOOM 3 DEVELOPERS
Glenda Adams, Director of Development at Aspyr Media, has been involved in Mac game development for over 20 years. I asked her to share a few thoughts on what attempts they had made to optimize Doom 3 on the Mac and what barriers prevented them from getting it to run as fast on the Mac as in comparable Windows PCs. Here's what she wrote:

"Just like the PC version, timedemos should be run twice to get accurate results. The first run the game is caching textures and other data into RAM, so the timedemo will stutter more. Running it immediately a second time and recording that result will give more accurate results.

The performance differences you see between Doom 3 Mac and Windows, especially on high end cards, is due to a lot of factors (in general order from smallest impact to largest):

1. PowerPC architectural differences, including a much higher penalty for float to int conversion on the PPC. This is a penalty on all games ported to the Mac, and can't be easily fixed. It requires re-engineering much of the game's math code to keep data in native formats more often. This isn't 'bad' coding on the PC -- they don't have the performance penalty, and converting results to ints saves memory and can be faster in many algorithms on that platform. It would only be a few percentage points that could be gained on the Mac, so its one of those optimizations that just isn't feasible to do for the speed increase.

2. Compiler differences. gcc, the compiler used on the Mac, currently can't do some of the more complex optimizations that Visual Studio can on the PC. Especially when inlining small functions, the PC has an advantage. Add to this that the PowerPC has a higher overhead for functional calls, and not having as much inlining drops frame rates another few percentage points.

Microsoft's Visual Studio is a 1st party software for the Wintel platform.

_________________

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 4:27:03

[ #405 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@bhabbott

Quote:

bhabbott wrote:
@cdimauro

Quote:

cdimauro wrote:

An 11%-13% code density improvement over Thumb-2 is a large difference.

No, it's trifling.

No, you simply (!) don't know of what you talk about.

It was already reported several times. There are studies which show that a 25-30% of code density improvement is roughly equivalent to a system with HALF the code cache size. I repeat again for YOUR benefit: HALF the code cache.

11-13% of around HALF that code density improvement. You should figure out yourself now if that is "trifling" (SIC!)...
Quote:
Quote:
Actually, x86-64 was sporting the best results:

Actually there's nothing in it (even if 'instruction count' has any relevance).

It's the other very important metric when talking about computer architectures...

And again, it was already reported several times.
Quote:
In the real world bloat swamps these trifling differences.

In a sensible world, people should only talk about tails they actually know.
Quote:
And nobody cares.

I must correct you: it is only the ignorant who aren't interested.
Quote:
64-bit means no limits

I reveal you a secret: 64-bit have limits. 2^64, precisely (but that's high-order math).
Quote:
and no reason to rein in the bloat.

And here comes again the magic word which Bruce repeats like a parrot several times when he's not able to accept the reality (which is very different from the cave where he's leaving): bloat.

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 4:28:37

[ #406 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@kolla

Quote:

kolla wrote:
Suppose this thread is as good as any to make people aware...

The fundraiser to bring AC68080 out of its "EC" state and bring on a "full" AC68000 with MMU, was met within just a few days. Interesting times ahead, though I suspect it is much too late to really draw the attention of "the right stuff", and Gunnar has of course been doing himself the disservice by running around and insisting that "nothing and no-one needs MMU", just like he did with the "nothing and no-one needs FPU" earlier.

https://www.gofundme.com/f/memory-management-unit-mmu-for-apollo-v4-series

I know it, because I follow his forum. Anyway, it's off-topic: here we discuss about code density (and memory footprint).

There's already a thread which was specifically created to talk about computer microarchitectures and similar things.

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 25-Aug-2025 4:33:02

[ #407 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@Hammer

Quote:

Hammer wrote:
@matthey

Quote:
For a 16-bit base VLE, the other common option is tiered registers with the lower 3b reg encoding of 16-bit instructions accessing the first 8 registers and 32-bit encodings using 4b reg encodings. This initially appears cleaner as the visible Dn/An register split can disappear but commonly used special registers like the SP need to be mapped to the low 8 registers which does not look as clean, leaves fewer registers available and I do not believe the code density is as good. Some of these ISAs create more instructions like PUSH and POP in the case of the SP but this reduces orthogonality. The x86-64 ISA is similar but poorly implemented as a couple of the lower 8 registers are not GP despite having PUSH and POP instructions leaving only 6 GP registers before needing a prefix to access 8 more GP registers. Fortunately for x86-64, CISC cores with mem-reg and reg-mem memory accesses usually do not need as many registers and load-to-use stalls are avoided, one of the keys why x86 with 6 GP registers stayed ahead in performance of fat RISC with 32 GP registers like Alpha, MIPS, PPC, etc. Fat RISC code density was another reason. RISC fanatics are slow learners but the failures eventually disappeared leaving more competitive RISC architectures.

Unlike 68060 vs 68LC060, Pentium guaranteed X87 registers for baseline PCs, which is enforced by major AAA PC games such as Tomb Raider and Quake.

PC DOS Tomb Raider running poorly on x87-less AMD 486SX2-66 (overlocked to 80 Mhz) with 3DFX Voodoo https://www.youtube.com/watch?v=XHqLYzqZciM
Various high clock speed 486 models are compared with PC DOS Tomb Raider.

IA-32's 8 GPR limitation evolved the X86 implementations to have fast data transfers with X87 and XMM registers.

Explicit ALU operation to be linked with data in memory instructions has lessened the demand on the inferior GPR count.

Actual microarchitecture implementation is equally important e.g. AMD K5 or Cyrix 6x86 have a stronger integer implementation path, while Intel Pentium has a stronger X87 implementation path.

https://barefeats.com/doom3.html

MAC GAME PERFORMANCE BRIEFING FROM THE DOOM 3 DEVELOPERS
Glenda Adams, Director of Development at Aspyr Media, has been involved in Mac game development for over 20 years. I asked her to share a few thoughts on what attempts they had made to optimize Doom 3 on the Mac and what barriers prevented them from getting it to run as fast on the Mac as in comparable Windows PCs. Here's what she wrote:

"Just like the PC version, timedemos should be run twice to get accurate results. The first run the game is caching textures and other data into RAM, so the timedemo will stutter more. Running it immediately a second time and recording that result will give more accurate results.

The performance differences you see between Doom 3 Mac and Windows, especially on high end cards, is due to a lot of factors (in general order from smallest impact to largest):

1. PowerPC architectural differences, including a much higher penalty for float to int conversion on the PPC. This is a penalty on all games ported to the Mac, and can't be easily fixed. It requires re-engineering much of the game's math code to keep data in native formats more often. This isn't 'bad' coding on the PC -- they don't have the performance penalty, and converting results to ints saves memory and can be faster in many algorithms on that platform. It would only be a few percentage points that could be gained on the Mac, so its one of those optimizations that just isn't feasible to do for the speed increase.

2. Compiler differences. gcc, the compiler used on the Mac, currently can't do some of the more complex optimizations that Visual Studio can on the PC. Especially when inlining small functions, the PC has an advantage. Add to this that the PowerPC has a higher overhead for functional calls, and not having as much inlining drops frame rates another few percentage points.

Microsoft's Visual Studio is a 1st party software for the Wintel platform.

Same as above to kolla, plus some gentle remainders:

Quote:

Hammer wrote:
@matthey

Quote:

The 32-bit 68060 has 16 GP integer registers, good orthogonality, a good FPU ISA with 8 GP FPU registers and it was obviously better than the in-order P5 Pentium equivalent. Motorola pulled the plug on the 68k for a RISC ISA more like Alpha though. I guess they could not read the writing on the wall.

You ignored X86 integer register use case are both GPR and x87 registers.

Out of curiosity, what do you mean with that?

Quote:

Hammer wrote:
@cdimauro

Quote:

cdimauro wrote:
@Hammer

You continue to report it, but there's not a single benchmark using this VLE for embedded (and solely there, it looks like).

That's despite I've already asked you several times.

Since there's nothing yet, I wonder who is using it. If anyone ever did it...

1. I don't care about NXP/STM's PowerPC VLE vs 68K. PPC fanboys can cover their CPU horse.

I haven't talked about PowerPC vs 68k: I've ONLY talked about VLE.

You reported it already several times, in the context of code density, yet there's not a single number baking any credibility of this PowerPC extension about this key metric (which is THE key metric when talking about embedded. In fact, it's an extension for the embedded).

BTW, I've just started reading this manual, and I've immediately found something which made me laugh. Those are another set of engineers which were living on a parallel world.
I leave you as an exercise to figure out what I was talking about. Hint: it's at the very beginning of the documentation.
Quote:
3. For VLE PPC, NXP/STM claims 30 percent code density improvement.

Source?

Status: Offline

matthey

Re: The (Microprocessors) Code Density Hangout
Posted on 26-Aug-2025 1:15:11

[ #408 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2825
From: Kansas

cdimauro Quote:

No, you simply (!) don't know of what you talk about.

It was already reported several times. There are studies which show that a 25-30% of code density improvement is roughly equivalent to a system with HALF the code cache size. I repeat again for YOUR benefit: HALF the code cache.

11-13% of around HALF that code density improvement. You should figure out yourself now if that is "trifling" (SIC!)...

The RISC-V code density manual and research talks about code density and number of instructions executed with some code density history.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

Variable-length instruction sets have long been used to improve code density. For example, the
IBM Stretch, developed in the late 1950s, had an ISA with 32-bit and 64-bit instructions, where
some of the 32-bit instructions were compressed versions of the full 64-bit instructions. Stretch
also employed the concept of limiting the set of registers that were addressable in some of the
shorter instruction format. The later IBM 360 architecture supported a simple variable-length
instruction encoding with 16-bit, 32-bit, or 48-bit instruction formats.

In 1963, CDC introduced the Cray-designed CDC 6600, a precursor to RISC architectures
that introduced a register-rich load-store architecture with instructions of two lengths, 15-bits
and 30-bits. The later Cray-1 design used a very similar instruction format, with 16-bit and
32-bit instruction lengths.

Some RISC fans like to claim the CDC 6600 as being RISC like because the ISA is simplified. It does not have load/store instructions, instead requiring writing registers to perform memory accesses. It also has three different types of registers, X0-X7, A0-A7 and B0-B7 and a VLE.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

The initial RISC ISAs from the 1980s all picked performance over code size, which was reasonable for a workstation environment, but not for embedded systems. Hence, both ARM and MIPS subsequently made versions of the ISAs that offered smaller code size by offering an alternative 16-bit wide instruction set instead of the standard 32-bit wide instructions. The compressed RISC ISAs reduced code size relative to their starting points by about 25â€“30%, yielding code that was significantly smaller than 80x86. This result surprised some, as their intuition was that the variable-length CISC ISA should be smaller than RISC ISAs that offered only 16-bit and 32-bit formats.

Fat RISC ISAs with a large code size picked simplicity over performance is how it should read. Compressed RISC ISAs were handicapped by fat RISC ISAs much like the 80x86 ISA was handicapped by starting as a 16-bit ISA while maintaining compatibility with a 808x 8-bit ISA. The smart thing to do would have been to start with a compressed 32-bit ISA like the 68k introduced in 1979. RISC-V finally got it right on the 5th attempt in 2010, discarding earlier RISC mistakes but falling short of 68k coded density.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

Since the original RISC ISAs did not leave sufficient opcode space free to include these unplanned compressed instructions, they were instead developed as complete new ISAs. This meant compilers needed different code generators for the separate compressed ISAs. The first compressed RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only a fixed 16-bit instruction size, which gave good reductions in static code size but caused an increase in dynamic instruction count, which led to lower performance compared to the original fixed-width 32-bit instruction size. This led to the development of a second generation of compressed RISC ISA designs with mixed 16-bit and 32-bit instruction lengths (e.g., ARM Thumb2, microMIPS, PowerPC VLE), so that performance was similar to pure 32-bit instructions but with significant code size savings. Unfortunately, these different generations of compressed ISAs are incompatible with each other and with the original uncompressed ISA, leading to significant complexity in documentation, implementations, and software tools support.

An increase in "dynamic instruction count" led to "lower performance" for 16-bit fixed length RISC encodings but also 16-bit and 32-bit VLEs which RISC-V developers still are clueless about. The BA2 and NanoMIPS realized that 48-bit encodings are necessary for 32-bit immediates/displacements to keep from breaking instructions apart thus increasing the number of instructions and reducing performance. They still do not match CISC performance with mem-reg and reg-mem single cycle memory/cache accesses and multiple scaled immediates/displacements.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently supports a compressed instruction format. It is surprising that the most popular 64-bit ISA for mobile platforms (ARM v8) does not include a compressed instruction format given that static code size and dynamic instruction fetch bandwidth are important metrics. Although static code size is not a major concern in larger systems, instruction fetch bandwidth can be a major bottleneck in servers running commercial workloads, which often have a large instruction working set.

If the caches are much larger than necessary with expensive high end hardware, then performance is not reduced as much as for more affordable hardware.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

Benefiting from 25 years of hindsight, RISC-V was designed to support compressed instructions from the outset, leaving enough opcode space for RVC to be added as a simple extension on top of the base ISA (along with many other extensions). The philosophy of RVC is to reduce code size for embedded applications and to improve performance and energy-efficiency for all applications due to fewer misses in the instruction cache. Waterman shows that RVC fetches 25%-30% fewer instruction bits, which reduces instruction cache misses by 20%-25%, or roughly the same performance impact as doubling the instruction cache size.

This is the RISC-V code density research we often refer to which bhabbott somehow missed. The 68k has about 50% better code density than fat "original RISC ISAs" like MIPS, SPARC, Alpha, ARM, PA-RISC and PPC so instruction caches needed to be roughly quadruple the size to match the instruction cache performance of the 68k. The 68060 8kiB instruction cache had the instruction cache performance of a PPC604e with 32kiB instruction cache and wasting 24kiB of instruction cache compared to the 68060. The 24kiB of instruction cache with 6 transistors per bit uses 1,179,648 transistors.

CPU | pipeline | caches | transistors | cost
68060 8-stage 8kiB/8kiB 2,530,000 ?
PPC603e 4-stage 16kiB/16kiB 2,600,000 $30
PPC604e 6-stage 32kiB/32kiB 5,100,000 $60

1,179,648/5,100,000 = 23% of PPC604e transistors were wasted on the I-cache vs 68060
1,179,648/2,530,000 = 47% of the transistors of the 68060 were wasted for PPC604e I-cache

The PPC604e with 5.1 million transistors was estimate to have twice the manufacturing cost of the PPC603e with 2.6 million transistors. The PPC604e could have likely had a ~23% lower manufacturing cost with an 8kiB I-cache lowering the cost from $60 to $46.12. The price is usually around three times the cost so the CPU price may have dropped by $41.63 if PPC had 68k code density. The PPC data is taken from the following Microprocessor Report. The problem with PPC was not cost but performance which did not compete against x86.

Arthur Revitalizes PowerPC Line
https://websrv.cecs.uci.edu/~papers/mpr/MPR/ARTICLES/110203.PDF Quote:

Arthurâ€™s (4-stage PPC G3) low manufacturing cost, however, lets IBM and Motorola continue to undercut Intelâ€™s prices.We expect Klamath to initially debut at a list price of $700â€“$800. In contrast, Arthur is likely to appear at $400â€“$500. Apple will pay far less, of course, while even Intelâ€™s best customers donâ€™t get much of a discount off list. Intel will bring down the price of Klamath over several quarters, but Arthurâ€™s cost structure will easily support a price well below Klamathâ€™s.

This positioning appears advantageous but is unlikely to boost PowerPCâ€™s prospects versus Intel. Without even getting into Appleâ€™s problems (see 1101ED.PDF), a major failure of PowerPC is that it has never been able to deliver a large performance advantage over Intel. Although lower CPU prices are impressive from a technical standpoint, the cost savings are usually eaten up by higher system margins and component costs. Arthur keeps pace with Intel but doesnâ€™t appear to change this basic equation. As noted, Appleâ€™s one opportunity for performance leadership will come in the notebook market, but the company must respond quickly when this opportunity knocks.

The PPC poor cache efficiency and shallow pipelines have synergies to sink PPC. The PPC604(e) had a 6-stage pipeline which could be clocked up more than the PPC603(e)/G3 but the PPC604e did not clock as high as the PPC604e after doubling the caches from 16kiB to 32kiB. Larger caches have a slower access time which also reduces performance and/or increases pipeline latency with a deeper pipeline. Motorola likely could have pipelined the instruction and data cache accesses by adding pipeline stages to access the cache over 2 or more stages. This increases the load-to-use penalty which most load/store RISC designs suffer from and the branch misprediction penalty though. Motorola stayed with the shallow pipeline design for the PPC G3 (Arthur) based on the PPC603 for this reason. Most CISC designs do not suffer from load-to-use stalls and Motorola added 2 stage instruction cache and data cache accesses with 32kiB instruction and data caches for the ColdFire V5 based on the 68060 design. The instruction pipeline only increased by 1 stage from the 8-stage 68060 to the 9-stage ColdFire V5 by combining stages. Newer chip fab processes decrease distances allowing more logic so 32kiB cache accesses in a single stage became possible later. Cache access times are still very important and larger caches have longer access times at all levels.

Good code density reduces system costs due to less memory needed. Fewer memory accesses reduces power allowing cheaper power supplies and reducing the cost of cooling. To summarize, code density advantages include the following.

1. fewer transistors for caches allow cheaper chip costs and/or better cache performance
2. smaller caches allow better performance and/or less latency
3. memory costs are reduced with smaller memory footprint systems
4. power supply and cooling costs are reduced by fewer memory accesses reducing power

These are just code density advantages where CISC has other advantages. It is amazing that all "original RISC ISA" developers did not understand what RISC-V research discovered much later about just the cache performance, enough by itself to sink fat RISC. I guess it is not surprising that bhabbott did not understand when DEC Alpha, HP PA-RISC and Motorola PPC developers did not see it. Credit to Intel for abandoning the i960 and StrongARM to return to x86 and to AMD for staying with x86(-64) instead of sailing with the Itanic. Motorola sure never looked back at the 68k after castrating their baby into ColdFire and throwing it out with the bathwater.

Last edited by matthey on 26-Aug-2025 at 01:25 AM.

Status: Offline

Hammer

Re: The (Microprocessors) Code Density Hangout
Posted on 26-Aug-2025 4:41:05

[ #409 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6658
From: Australia

@cdimauro

Quote:
Source?

https://www.nxp.com/docs/en/supporting-information/VLEPIM.pdf
NXP, Variable-Length Encoding (VLE) Extension Programming Interface Manual, Rev. 1, 2/2006

Page 25 of 56

Reduce overall code size by 30 percent over existing PowerPC text segments

_________________

Status: Offline

Hammer

Re: The (Microprocessors) Code Density Hangout
Posted on 26-Aug-2025 5:20:29

[ #410 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6658
From: Australia

@matthey

https://youtu.be/s1G2caOSi9M?t=780
Tomb Raider (1996) 320x200p benchmarks for various 486DX and Pentium Overdrive.
Frame rates are capped at around 30 fps.

Pentium Overdrive 83 Mhz = 29.1 fps

AMD 5x86 160 Mhz = 29.1 fps
AMD 5x86 133 Mhz = 27.5 fps
AMD 486DX4 100Mhz =22.9 fps

Cyrix 5x86 100Mhz (50 Mhz bus x2) = 27.6 fps
Cyrix 5x86 Enhanced 100Mhz = 27.5 fps
Cyrix 5x86 100Mhz = 23.9 fps
Cyrix 486 100Mhz = 20.4 fps
Cyrix 5x86 primarily uses a Socket 3 (168-pin PGA).

Intel 486DX4 100Mhz = 23.7 fps
Intel 486DX2 66Mhz Writeback cache = 15.8 fps

These are 32-bit FSB platforms.
----------

https://www.youtube.com/watch?v=KiNTp1jlrR4
OpenLara running on A1200 with 68060 @ 50Mhz.

From https://eab.abime.net/showthread.php?t=120230

Vampire AC68080 V2 easily reaches 30 fps cap limit.

The old Apollo 1240 @ 40MHz is around 12 fps. 68060 @ 50MHz is not delivering 2X over 68040 @ 40 Mhz.

Trinity1240 (68040 @ 33 MHz) with semi-modern SDR memory is around 12 fps.

Trinity1240/1260 project can also support 68060 via a jumper.

http://www.b737.org.uk/fmc.htm
The recent FMC Model 2907C1 has an MC68040 processor running at 60MHz (30MHz bus clock speed).

Last edited by Hammer on 26-Aug-2025 at 05:28 AM.
Last edited by Hammer on 26-Aug-2025 at 05:25 AM.

_________________

Status: Offline

kolla

Re: The (Microprocessors) Code Density Hangout
Posted on 29-Aug-2025 1:20:29

[ #411 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3527
From: Trondheim, Norway

@Hammer

Quote:

Hammer wrote:

AC68080 has increased general-purpose performance, and SAGA comes with RTG's chunky graphics. Why not use Linux 68K on it?

Why are you asking me? I've been running Linux on 68k for more than 3 decades on both real and emulated hardware. The ability to run Linux or NetBSD is the only thing that would make me perhaps buy a V4 eventually. But how likely is that, really? Who would maintain Linux and NetBSD support for 68080, and how? Is the "Apollo team" capable and interested? Are the Linux/68k and NetBSD/68k teams evem interested at this point?

(What do you mean with "RTG's chunky graphics"? RTG is software, an API for AmigaOS, and SAGA is not 68080. Granted, a V4 has both 68080 and SAGA, and SAGA does chunky graphics, but RTG is irrelevant for Linux)

Last edited by kolla on 29-Aug-2025 at 01:24 AM.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 2-Sep-2025 5:35:41

[ #412 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@Hammer, @kolla what's not clear to you that this thread is about code density?

As I've already reported, there's another one which is better suited for discussions about microarchitectures et similia: https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=45544&forum=17

Or, you can certainly open new threads talking about those topics.

@matthey

Quote:

matthey wrote:
Newer chip fab processes decrease distances allowing more logic so 32kiB cache accesses in a single stage became possible later. Cache access times are still very important and larger caches have longer access times at all levels.

Only one point here: this depends on the cache lines granularity (size).

In fact, you can double the cache size, but if you double as well the cache lines size, then the access time is the same (keeping all other factors the same, of course).
The price to pay is more traffic -> more transistors for the buffers & more power drawn, but those are other factors.
Quote:
Good code density reduces system costs due to less memory needed. Fewer memory accesses reduces power allowing cheaper power supplies and reducing the cost of cooling. To summarize, code density advantages include the following.

1. fewer transistors for caches allow cheaper chip costs and/or better cache performance
2. smaller caches allow better performance and/or less latency
3. memory costs are reduced with smaller memory footprint systems
4. power supply and cooling costs are reduced by fewer memory accesses reducing power

These are just code density advantages where CISC has other advantages.

It's a good summary, thanks.
Quote:
It is amazing that all "original RISC ISA" developers did not understand what RISC-V research discovered much later about just the cache performance, enough by itself to sink fat RISC. I guess it is not surprising that bhabbott did not understand when DEC Alpha, HP PA-RISC and Motorola PPC developers did not see it. Credit to Intel for abandoning the i960 and StrongARM to return to x86 and to AMD for staying with x86(-64) instead of sailing with the Itanic. Motorola sure never looked back at the 68k after castrating their baby into ColdFire and throwing it out with the bathwater.

As I've already said on one of the last replies to minator, at the time the only relevant metric for chip vendors was performance, and nothing else.

Code density wasn't relevant, because they wanted to win the speed race, and due to that not even the price was important (they were packing tons of transistors on their chips only to get more performance), neither power consumption was relevant.

The things were working differently on the embedded and pocket console market, and the history of the Nintendo GameBoy Advance and ARM's Thumb was a clear indication of how important code density was.

It became important to general purpose processors & architectures once they hit the wall of scaling with newer node processes and frequency, as well as the impact of mobile devices on our lives, which brought them to rediscover how much important was this key factor.

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 2-Sep-2025 5:42:14

[ #413 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:
Source?

https://www.nxp.com/docs/en/supporting-information/VLEPIM.pdf
NXP, Variable-Length Encoding (VLE) Extension Programming Interface Manual, Rev. 1, 2/2006

Page 25 of 56

Reduce overall code size by 30 percent over existing PowerPC text segments

It's just a number baked by not a single data supporting it -> irrelevant.

Pay attention that if we have to consider that relevant, then it would mean that VLE was/is performing WAY BETTER than Thumb-2, BA2 and NanoMIPS, getting very very close to 8086 results.
Just take a look at some benchmark reported on the first page of this thread, applying this 30% to all PowerPC data, and you can figure out yourself how completely unrealistic it would be.
In fact, VLE doesn't even support 32-bit immediates, and despite everything, it shows far better results compared to to NanoMIPS and even BA2 (which is the best architecture in terms of pure code density. 32-bit architecture, to be more precise).

Freescale has to show where this number comes from.

Status: Offline

matthey

Re: The (Microprocessors) Code Density Hangout
Posted on 2-Sep-2025 18:51:01

[ #414 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2825
From: Kansas

cdimauro Quote:

Only one point here: this depends on the cache lines granularity (size).

In fact, you can double the cache size, but if you double as well the cache lines size, then the access time is the same (keeping all other factors the same, of course).
The price to pay is more traffic -> more transistors for the buffers & more power drawn, but those are other factors.

The smaller the cache size the lower the cache access latency is a rule of thumb. There are other cache characteristics which affect access time including cache associativity and the cache line size. I expect the cache line size to have less of a direct effect on cache hit access time than cache size and associativity. Larger cache line sizes are more important for cache misses where the cache access time is less important than higher level cache and memory access times. Larger cache lines increase conflict misses which is usually compensated with more cache associativity increasing cache access latency despite the performance advantages.

cdimauro Quote:

As I've already said on one of the last replies to minator, at the time the only relevant metric for chip vendors was performance, and nothing else.

Code density wasn't relevant, because they wanted to win the speed race, and due to that not even the price was important (they were packing tons of transistors on their chips only to get more performance), neither power consumption was relevant.

The things were working differently on the embedded and pocket console market, and the history of the Nintendo GameBoy Advance and ARM's Thumb was a clear indication of how important code density was.

It became important to general purpose processors & architectures once they hit the wall of scaling with newer node processes and frequency, as well as the impact of mobile devices on our lives, which brought them to rediscover how much important was this key factor.

Right. Early RISC was a clock speed race with, in race car terms, low energy density fuel.

Methanol is great for a drag race. A fire breathing 1.3L engine in a RX-7 can run a 7.9s@177mph 1/4 mile (400m) and this is not the most powerful or quickest 13B car or even quickest 13B RX-7 in the world but I chose it because the car info is given.

Matt Esplan Runs a 7!! | ESPYFAB RACING 13B TURBO FD RX7 Drag Car | MAZDA | FullBOOST | Drag Racing
https://youtu.be/fVVTPf-kESo?t=272

It has 12x1600cc injectors to supply the methanol where stock for gas is 2x550cc and 2x850cc (my mildly modified 1993 RX-7 uses 4x850cc and ran 12.8s@110mph in the 1/4 mile). It uses a production motor with production parts although some parts are from the S5 RX-7 introduced in 1989. Even 1980s tech can be good. The only problem is that CPUs are not like drag race cars but like endurance race cars. Consistent performance over a long time is desired for a CPU as there are no speed limits. It is just a matter of supplying enough instructions, like fuel for a car, for the performance where code density, like energy dense gasoline, is an advantage. The Wankel rotary engine and RX-7 are actually better known for endurance racing and handling using gasoline. A light weight engine with few moving parts allowing a low center of gravity with no valves to blow out or get in the way for turbos offers certain advantages and the engine could be much lighter if it was all aluminum where I can already pick it up by myself with it being the size of a beer keg without intake or exhaust manifolds. Performance was never a problem for the rotary engine either which is kind of like CISC CPUs which have more performance potential than RISC CPUs, yet piston engines replaced rotary engines like RISC CPUs replaced CISC CPUs. Well, RISC CPUs became more CISC like while retaining the RISC propaganda where it is not possible for piston engines to become more a rotary engine. The rotary engine still has potential too, especially with multi-fuels and as a small lightweight engine for recharging batteries. The M-1 tank uses a rotary engine for auxiliary power and it is much more practical than the thirsty turbine engine. Diesel engines are hard to beat for tanks and better for tanks than gasoline engines, as the Germans discovered during WWII where fire breathing tanks are not as good as fire breathing race cars but diesel is more difficult to make from coal. Fuel supply is just as important for internal combustion engines as instruction supply is for CPUs as both become useless without them.

cdimauro Quote:

It's just a number baked by not a single data supporting it -> irrelevant.

Pay attention that if we have to consider that relevant, then it would mean that VLE was/is performing WAY BETTER than Thumb-2, BA2 and NanoMIPS, getting very very close to 8086 results.
Just take a look at some benchmark reported on the first page of this thread, applying this 30% to all PowerPC data, and you can figure out yourself how completely unrealistic it would be.
In fact, VLE doesn't even support 32-bit immediates, and despite everything, it shows far better results compared to to NanoMIPS and even BA2 (which is the best architecture in terms of pure code density. 32-bit architecture, to be more precise).

Freescale has to show where this number comes from.

A 30% claim for PPC VLE is typical of compressed RISC ISAs. Just above in post #408 I quoted RISC-V documentation which gave 25-30% reduced code size relative to their starting points for compressed RISC ISAs.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

The initial RISC ISAs from the 1980s all picked performance over code size, which was reasonable for a workstation environment, but not for embedded systems. Hence, both ARM and MIPS subsequently made versions of the ISAs that offered smaller code size by offering an alternative 16-bit wide instruction set instead of the standard 32-bit wide instructions. The compressed RISC ISAs reduced code size relative to their starting points by about 25â€“30%, yielding code that was significantly smaller than 80x86. This result surprised some, as their intuition was that the variable-length CISC ISA should be smaller than RISC ISAs that offered only 16-bit and 32-bit formats.

It is interesting that you mention the best compressed RISC ISAs getting close to 8086 code density when the RISC-V documentation says the 25-30% reduced code size is significantly smaller than 80x86 code size. That is quite the decline in code density from the 8086 to 80x86 even though 8086 code can execute on 80x86 CPUs. In reality, the 808x and x86 ISAs have significantly better code density if code is size optimized for 8-bit datatypes, stack accesses and 6 GP registers. The 68k has a 32-bit ISA with more efficient use of larger datatypes and 16 GP registers so the code density remains good when optimizing for performance too. The Vince Weaver contest has the 68k with ~45% better code density than PPC and x86 at ~30% better code density. Where 25-30% better code density may be typical for compressed RISC ISAs, I expect the best code density ISAs to be 40%-50% better code density than classic RISC ISAs like MIPS, SPARC and PPC (and excluding Alpha and PA-RSIC). It is easy to show any ISA not reaching its code density potential as RISC-V studies have demonstrated. Compiler options, compiler selection and benchmark selection play large roles in code density.

Status: Offline

bhabbott

Re: The (Microprocessors) Code Density Hangout
Posted on 3-Sep-2025 2:34:57

[ #415 ]

Cult Member

Joined: 6-Jun-2018
Posts: 578
From: Aotearoa

@cdimauro

Quote:

cdimauro wrote:
@bhabbott

you simply (!) don't know of what you talk about.

It was already reported several times. There are studies which show that a 25-30% of code density improvement is roughly equivalent to a system with HALF the code cache size. I repeat again for YOUR benefit: HALF the code cache.

11-13% is half as much as 25-30%, so equivalent to a system with 2/3 to 3/4 the cache size. But what does this mean? Code has to fit in the cache to benefit, so more compact code can do more at cache speed. Great. Then bloat wipes it all out. in this case the code only has to bloat by 11-13% to wipe out the gains. In the real world that's nothing.

Quote:
I reveal you a secret: 64-bit have limits. 2^64, precisely (but that's high-order math).

In practice it's no limit. 1^64 is 16.8 exabytes. A top-end desktop computer today might have 64 gigabytes, or ~0.00000001% of the theoretical maximum.

Quote:
And here comes again the magic word which Bruce repeats like a parrot several times when he's not able to accept the reality (which is very different from the cave where he's leaving): bloat.

The cave I am living in is an Amiga 1200 with 50MHz 68030 and 32MB RAM. The CPU has a 256 byte instruction cache. The simplicity of that cave appeals to me, and I like living within its confines. However many others don't like being so restricted. Today you can throw a PiStorm into your Amiga and have mind-blowing performance - yet that still isn't enough for some.

The irony of it is that you guys are also living in a cave. While you pontificate about which ISA has the best code density, Amiga coders write apps in Hollywood or port over games designed for much more powerful PCs. 68k might have a bit better code density than ARM, but that's irrelevant when the Pi's CPU is running at several GHz and emulates a 68k much faster than any real one.

Last edited by bhabbott on 03-Sep-2025 at 02:38 AM.
Last edited by bhabbott on 03-Sep-2025 at 02:36 AM.

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 4-Sep-2025 20:52:21

[ #416 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@matthey

Quote:

matthey wrote:
cdimauro Quote:

Only one point here: this depends on the cache lines granularity (size).

In fact, you can double the cache size, but if you double as well the cache lines size, then the access time is the same (keeping all other factors the same, of course).
The price to pay is more traffic -> more transistors for the buffers & more power drawn, but those are other factors.

The smaller the cache size the lower the cache access latency is a rule of thumb. There are other cache characteristics which affect access time including cache associativity and the cache line size. I expect the cache line size to have less of a direct effect on cache hit access time than cache size and associativity. Larger cache line sizes are more important for cache misses where the cache access time is less important than higher level cache and memory access times. Larger cache lines increase conflict misses which is usually compensated with more cache associativity increasing cache access latency despite the performance advantages.

Correct, and it's a general insight.

However, I was only talking about increasing the cache size and the cache line size at the same time, to keep the same access time.

Processors have increased their caches sizes, but they have also increased the caches line sizes. For example, 16 bytes cache line size was common at the 68060 time with its 8+8kB caches, but over the time they become 16+16kB, then with 32 bytes lines, 32+32kB, then with 64 bytes lines, and... increasing. The line sizes increased primarily because of keeping the same 4:1 ratio (4 data fetched for each memory access) with DDR memories, but it was useful to keep the same access time for the bigger caches.
Quote:
cdimauro Quote:

As I've already said on one of the last replies to minator, at the time the only relevant metric for chip vendors was performance, and nothing else.

Code density wasn't relevant, because they wanted to win the speed race, and due to that not even the price was important (they were packing tons of transistors on their chips only to get more performance), neither power consumption was relevant.

The things were working differently on the embedded and pocket console market, and the history of the Nintendo GameBoy Advance and ARM's Thumb was a clear indication of how important code density was.

It became important to general purpose processors & architectures once they hit the wall of scaling with newer node processes and frequency, as well as the impact of mobile devices on our lives, which brought them to rediscover how much important was this key factor.

Right. Early RISC was a clock speed race with, in race car terms, low energy density fuel.

Methanol is great for a drag race. A fire breathing 1.3L engine in a RX-7 can run a 7.9s@177mph 1/4 mile (400m) and this is not the most powerful or quickest 13B car or even quickest 13B RX-7 in the world but I chose it because the car info is given.

Matt Esplan Runs a 7!! | ESPYFAB RACING 13B TURBO FD RX7 Drag Car | MAZDA | FullBOOST | Drag Racing
https://youtu.be/fVVTPf-kESo?t=272

It has 12x1600cc injectors to supply the methanol where stock for gas is 2x550cc and 2x850cc (my mildly modified 1993 RX-7 uses 4x850cc and ran 12.8s@110mph in the 1/4 mile). It uses a production motor with production parts although some parts are from the S5 RX-7 introduced in 1989. Even 1980s tech can be good. The only problem is that CPUs are not like drag race cars but like endurance race cars. Consistent performance over a long time is desired for a CPU as there are no speed limits. It is just a matter of supplying enough instructions, like fuel for a car, for the performance where code density, like energy dense gasoline, is an advantage. The Wankel rotary engine and RX-7 are actually better known for endurance racing and handling using gasoline. A light weight engine with few moving parts allowing a low center of gravity with no valves to blow out or get in the way for turbos offers certain advantages and the engine could be much lighter if it was all aluminum where I can already pick it up by myself with it being the size of a beer keg without intake or exhaust manifolds. Performance was never a problem for the rotary engine either which is kind of like CISC CPUs which have more performance potential than RISC CPUs, yet piston engines replaced rotary engines like RISC CPUs replaced CISC CPUs. Well, RISC CPUs became more CISC like while retaining the RISC propaganda where it is not possible for piston engines to become more a rotary engine. The rotary engine still has potential too, especially with multi-fuels and as a small lightweight engine for recharging batteries. The M-1 tank uses a rotary engine for auxiliary power and it is much more practical than the thirsty turbine engine. Diesel engines are hard to beat for tanks and better for tanks than gasoline engines, as the Germans discovered during WWII where fire breathing tanks are not as good as fire breathing race cars but diesel is more difficult to make from coal. Fuel supply is just as important for internal combustion engines as instruction supply is for CPUs as both become useless without them.

It took me a while, but I appreciated (and it enriched my cultural background) the similarity with the world of engines, thank you.
Quote:
cdimauro Quote:

It's just a number baked by not a single data supporting it -> irrelevant.

Pay attention that if we have to consider that relevant, then it would mean that VLE was/is performing WAY BETTER than Thumb-2, BA2 and NanoMIPS, getting very very close to 8086 results.
Just take a look at some benchmark reported on the first page of this thread, applying this 30% to all PowerPC data, and you can figure out yourself how completely unrealistic it would be.
In fact, VLE doesn't even support 32-bit immediates, and despite everything, it shows far better results compared to to NanoMIPS and even BA2 (which is the best architecture in terms of pure code density. 32-bit architecture, to be more precise).

Freescale has to show where this number comes from.

A 30% claim for PPC VLE is typical of compressed RISC ISAs. Just above in post #408 I quoted RISC-V documentation which gave 25-30% reduced code size relative to their starting points for compressed RISC ISAs.

The RISC-V Compressed Instruction Set Manual
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf Quote:

The initial RISC ISAs from the 1980s all picked performance over code size, which was reasonable for a workstation environment, but not for embedded systems. Hence, both ARM and MIPS subsequently made versions of the ISAs that offered smaller code size by offering an alternative 16-bit wide instruction set instead of the standard 32-bit wide instructions. The compressed RISC ISAs reduced code size relative to their starting points by about 25â€“30%, yielding code that was significantly smaller than 80x86. This result surprised some, as their intuition was that the variable-length CISC ISA should be smaller than RISC ISAs that offered only 16-bit and 32-bit formats.

It is interesting that you mention the best compressed RISC ISAs getting close to 8086 code density when the RISC-V documentation says the 25-30% reduced code size is significantly smaller than 80x86 code size. That is quite the decline in code density from the 8086 to 80x86 even though 8086 code can execute on 80x86 CPUs. In reality, the 808x and x86 ISAs have significantly better code density if code is size optimized for 8-bit datatypes, stack accesses and 6 GP registers.

Indeed, and we know that they are far away from this claim.

That's why I've asked the source for this 30% for VLE: it's absolutely unrealistic.
Quote:
The 68k has a 32-bit ISA with more efficient use of larger datatypes and 16 GP registers so the code density remains good when optimizing for performance too. The Vince Weaver contest has the 68k with ~45% better code density than PPC and x86 at ~30% better code density. Where 25-30% better code density may be typical for compressed RISC ISAs, I expect the best code density ISAs to be 40%-50% better code density than classic RISC ISAs like MIPS, SPARC and PPC (and excluding Alpha and PA-RSIC). It is easy to show any ISA not reaching its code density potential as RISC-V studies have demonstrated. Compiler options, compiler selection and benchmark selection play large roles in code density.

That's why I prefer to avoid using Weaver's results and translating them to evaluate other results: it's finely tuned, manually written assembly code, which doesn't represent the real world. It's good as a challenge and to have some fun, but it's better to stop here.

In the real world almost all code is written with high-level languages, and I prefer to have benchmarks using them (so, using compilers). Which, unfortunately, penalizes architectures which haven't a good compiler support.

Status: Offline

cdimauro

Re: The (Microprocessors) Code Density Hangout
Posted on 4-Sep-2025 21:02:46

[ #417 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4577
From: Germany

@bhabbott

Quote:

bhabbott wrote:
@cdimauro

Quote:

cdimauro wrote:
@bhabbott

you simply (!) don't know of what you talk about.

It was already reported several times. There are studies which show that a 25-30% of code density improvement is roughly equivalent to a system with HALF the code cache size. I repeat again for YOUR benefit: HALF the code cache.

11-13% is half as much as 25-30%, so equivalent to a system with 2/3 to 3/4 the cache size. But what does this mean? Code has to fit in the cache to benefit, so more compact code can do more at cache speed. Great.

Right.
Quote:
Then bloat wipes it all out. in this case the code only has to bloat by 11-13% to wipe out the gains. In the real world that's nothing.

What do you mean with "bloat" here?
Quote:
Quote:
I reveal you a secret: 64-bit have limits. 2^64, precisely (but that's high-order math).

In practice it's no limit. 1^64 is 16.8 exabytes. A top-end desktop computer today might have 64 gigabytes, or ~0.00000001% of the theoretical maximum.

First of all, you're only considering 64-bit used for pointers AKA referencing data. Which is HALF of the cake, since they can be used to hold & manipulate data.

Second, 64-bit pointers allow things which aren't possible with 32-bit pointers, with benefit in some areas.

Third, a 64-bit architecture isn't forced to use 64-bit pointers. SOME architectures do, but not all: x86-64 is an example of the latter, as it was proved by the x32 ABI (to do NOT be confused with x86). My last architecture is another example, which allows to have pointers down to 8 bits (yes, you've read it correctly) and "some other stuff" as pointers (NOT necessarily using all 64 bits).
Quote:
Quote:
And here comes again the magic word which Bruce repeats like a parrot several times when he's not able to accept the reality (which is very different from the cave where he's leaving): bloat.

The cave I am living in is an Amiga 1200 with 50MHz 68030 and 32MB RAM. The CPU has a 256 byte instruction cache. The simplicity of that cave appeals to me, and I like living within its confines.

QED...
Quote:
However many others don't like being so restricted. Today you can throw a PiStorm into your Amiga and have mind-blowing performance

Right.
Quote:
yet that still isn't enough for some.

Right, and?
Quote:
The irony of it is that you guys are also living in a cave. While you pontificate about which ISA has the best code density, Amiga coders write apps in Hollywood or port over games designed for much more powerful PCs. 68k might have a bit better code density than ARM, but that's irrelevant when the Pi's CPU is running at several GHz and emulates a 68k much faster than any real one.

A complete non-sense, since having better code density brings consistent benefits DESPITE having processors emulating 68k code faster than a real processor.

You still don't get it, even after so many discussions -> YOUR problem!

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle