Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 6:54:33
| | [ #281 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @Hammer
Quote:
Hammer wrote: @cdimauro
Quote:
FACT: the article was about a study and Jaguar was NOT part of the study, but only BobCat was. So, and again, you're going OUTSIDE the topic/context with the sole purpose of defending your beloved AMD (BTW, have you bought stocks from it?)
|
FACT: Jaguar is the direct successor to Bobcat. What you didn't get is that the Jaguar is superior to the Bobcat uarch. Are you too stupid to realize a direct successor uArch is superior to the older uArch?
FACT: Jaguar has two game console design wins, NOT BobCat.
https://www.cpubenchmark.net/cpu.php?cpu=AMD+Athlon+5370+APU&id=2763 AMD Athlon 5370 4 Core (Jaguar) 2.2 GHz APU physics (Bullet physics library): 129 Frames/Sec
https://www.cpubenchmark.net/cpu.php?cpu=ARM+Cortex-A15+4+Core+2000+MHz&id=5261 ARM Cortex-A15 4 Core 2GHz physics (Bullet physics library): 106 Frames/Sec
https://www.cpubenchmark.net/cpu_test_info.html
The Physics Test uses the Bullet Physics Engine (version 2.88 for x86, 3.07 for ARM)
This is for the 2011 context for Xbox One's and PlayStation 4's development cycle. -----------------------------------------
https://www.tomshardware.com/video-games/console-gaming/amd-to-design-processor-for-xbox-next-team-red-extends-long-standing-microsoft-partnership Date: June 2025, AMD to design processor for Xbox Next
https://videocardz.com/newz/amd-reportedly-won-contract-to-design-playstation-6-chip-outbidding-intel-and-broadcom Date: Sep 2024, AMD reportedly won contract to design PlayStation 6 chip, outbidding Intel and Broadcom.
That's three game console generations. Let that sink in.
My arguments are framed within the Amiga 500's majority use case i.e. games.
Quote:
FACT: YOU stated that the used benchmarks were "nearly useless" (YOUR words) for gaming. I've asked proof of that which did NOT come, because it's clearly evident that your was a pure load of b@lls that nobody with a grain of salt could sustain.
|
Are you so stupid that even Intel bans UserBenchmark e.g. https://www.reddit.com/r/intel/comments/g36a2a/userbenchmark_has_been_banned_from_rintel/
|
Quote:
Hammer wrote: @cdimauro
Quote:
Another proof that you do NOT read what people write. Here's the main chart with the benchmark results:
Do you see ONLY SPEC INT there or even SOMETHING ELSE?!?
As it was proven already several times, you're just a PARROT which repeat the same meaningless things and post things took by googling around without even understanding their context and, what's worse, the CONTEXT of discussions.
|
You FAILED to factor in the mixed integer/floating point game use case.
SPEC INT and SPEC FP benchmarks are separate from each other, while the Quake benchmark is a mixed integer/floating point game use case.
SPEC INT and SPEC FP focus has trainwrecked PPC's mixed integer/floating point game use case e.g. Doom 3 PPC vs X86.
https://barefeats.com/doom3.html
From Glenda Adams, Director of Development at Aspyr Media,
PowerPC architectural differences, including a much higher penalty for float to int conversion on the PPC. This is a penalty on all games ported to the Mac, and can't be easily fixed. It requires re-engineering much of the game's math code to keep data in native formats more often. This isn't 'bad' coding on the PC -- they don't have the performance penalty, and converting results to ints saves memory and can be faster in many algorithms on that platform. It would only be a few percentage points that could be gained on the Mac, so its one of those optimizations that just isn't feasible to do for the speed increase.
Have you run Lightwave benchmark between AC68080 vs MC68060 rev6?
Lightwave's performance difference is not like Quake benchmark, which showcases's AC68080 advantage over 68060 rev6. https://eab.abime.net/showthread.php?t=113338 Lightwave benchmark for users with PPC, 060, PiStorm, Vampire accelerators
Apollo Ice Drake's Lightwave benchmark results weren't a major leap from 68060 Rev6. 68080's quad instruction issue per cycle wasn't matched with multiple floating point pipelines. |
I'll NOT furtherly spend my time replying the same things to a BOT for whom I've already proved that it's: - not able to understand the context of the discussion; - doesn't have a clue about the arguments where it pretends to talk about; - is not able to read, even charts which a kid can understand; - lack elementary logic.
I've already given you ALL answers on the PRECISE points / contexts of the discussion, and AFTER that YOU decided to escape like a bunny, since also other people criticized your idiotic garbage.
I'll not waste my limited time on the same topics again: grow and, even more important, STUDY, since you're a complete ignorant! |
| Status: Offline |
| | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 6:55:57
| | [ #282 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @Hammer
Quote:
Hammer wrote: @cdimauro
Quote:
OK, and? What's the point? |
https://www.phoronix.com/news/Intel-AVX10-Drops-256-Bit Date: March 2025 Intel drops 256-bit AVX10 only E-Cores, future Intel desktop CPUs have 512-bit AVX10.2.
Intel's u-turn on AVX-512 support for desktop. All Intel future platforms will support a 512-bit vector width.
Pat Gelsinger is fired in December 2024.
Within GCC patches also spelled out clearly:
"In this new whitepaper, all the platforms will support 512 bit vector width (previously, E-core is up to 256 bit, leading to hybrid clients and Atom Server 256 bit only). Also, 256 bit rounding is not that useful because we currently have rounding feature directly on E-core now and no need to use 256-bit rounding as somehow a workaround. HW will remove that support.
Thus, there is no need to add avx10.x-256/512 into compiler options. A simple avx10.x supporting all vector length is all we need. The change also makes -mno-evex512 not that useful. It is introduced with avx10.1-256 for compiling 256 bit only binary on legacy platforms to have a partial trial for avx10.x-256. What we also need to do is to remove 256 bit rounding."
Stop defending Intel (Pat Gelsinger)'s absurd flip-flopping with AVX-512. |
I've already and IMMEDIATELY reported this news once it was published AND added my comment on that (NOT in favour of Intel), idiot! |
| Status: Offline |
| | ppcamiga1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 6:57:27
| | [ #283 ] |
| |
 |
Super Member  |
Joined: 23-Aug-2015 Posts: 1020
From: Unknown | | |
|
| matthey what you wrote is pure bs. stop trolling start working on mui on unix
|
| Status: Offline |
| | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 7:21:30
| | [ #284 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @matthey
Quote:
matthey wrote:
kolla Quote:
Commodore OS Vision being a common Linux distro with certain "theme" for Gnome?
|
Commodore OS Vision is what Commodore would have created with their amazing "vision" of the future, according to Peri. It is an officially branded "Commodore" OS which is all that should matter for full acceptance by every Commodore and Commodore Amiga customer and user. |
 Quote:
kolla Quote:
Pfff... the main "game" on AmigaOne is the tinkering and shopping required to get all the components needed to even launch any of the handful of 3D games exists, and then show off screenshots with less impressive frames-per-second counts.
|
But AmigaOne is officially "AmigaOne" branded and uses officially branded "AmigaOS 4". 3D support and games are their strength. |
Nevertheless, how many 3D games are EXCLUSIVELY available for OS4?
And, way more important, which apps & games are they mostly using? Quote:
kolla Quote:
Even if that had happened, it would not imply that Amiga's core audience is in this for "3D gaming", because that is simply not true - the core audience are in it for the more than 2600 legacy games and close to 1000 demos one can run through WHDLoad. It is nice that some "2.5D" games classics like Doom and Quake also have become available with faster CPUs and faster RTG, but actual 3D games are extremely few on Amiga, and it's all quite cumbersome and convoluted. Anyone who's really into what you can call "3D gaming" is certainly _not_ seeking any sort of Amiga for it.
As for the A600GS/A1200NG - though the Orange Pi zero 3 has a Mali GPU, it is not exposed from inside Amiberry. However work has been ongoing to use OpenGL (which Mali supports) to render the emulation screen (via SDL) and add CRT-like effects such as fake scanlines etc. However, as stated on the Amiberry build page... "Currently not fully implemented! Do not enable!" - https://github.com/BlitterStudio/amiberry/wiki/Compile-from-source
|
If the "more than 2600 legacy games and close to 1000 demos" are what is important for the Amiga then why is it the high performance Amiga hardware replacements that are selling instead of, for example, a $45 USD FleaFPGA Ohm? |
Exactly. Which means that there's a market for having much more.
Otherwise Vampire and PiStorm are no-sense. Quote:
The extra performance is not needed for most of the Amiga retro games so is the performance desired for 3D, and if not, what else? |
I'd suggest making a poll.
I'm pretty sure that the vast majority is using the good old games, but the poll should exclude them and ask on WHAT ELSE Amigans are spending their time on their Amiga and/or post-Amiga platform. Quote:
If Amiga fans want 3D and are willing to replace everything Amiga but the retro games, is Commodore OS Vision on x86-64 hardware in an Amiga case the best solution? |
We don't need another PC/x86 or ARM for that...
Anyway, could we please use this thread only for the discussing about code density (memory footprint is also fine, because code density is part of)? I'd prefer to have other discussions moved to specific threads. Thanks. |
| Status: Offline |
| | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 7:23:36
| | [ #285 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @ppcamiga1
Quote:
ppcamiga1 wrote: matthey what you wrote is pure bs. stop trolling start working on mui on unix |
Move your idiocy out of this thread and YOU start working on YOUR shit (Unix is shit). |
| Status: Offline |
| | kolla
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 20:42:28
| | [ #286 ] |
| |
 |
Elite Member  |
Joined: 20-Aug-2003 Posts: 3477
From: Trondheim, Norway | | |
|
| @cdimauro
Unix is the shit that lets you post here. _________________ B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC |
| Status: Offline |
| | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 22-Jun-2025 20:49:56
| | [ #287 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @kolla
Quote:
kolla wrote: @cdimauro
Unix is the shit that lets you post here. |
Irrelevant.
And:
Anyway, could we please use this thread only for the discussing about code density (memory footprint is also fine, because code density is part of)? I'd prefer to have other discussions moved to specific threads. Thanks. |
| Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jun-2025 0:51:23
| | [ #288 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2756
From: Kansas | | |
|
| cdimauro Quote:
I agree. Those results are a big surprise (especially considering the ones for the ARC architecture) and they should be taken with a grain of salt.
What I don't like is that the guy hasn't shared all compilation options that he used, because some were tweaked by his admissions. So, it's difficult to reproduce the results or make some changes (like removing frames) to make a fair comparison (or trying different compilers, like LLVM).
|
The graphed data here is from the data you gave in post #5 of this thread. I found the graphed data and thought it would be nice to include it in the thread, even though I agree the data is not the best for comparing code density. The best and most comprehensive comparison of code density remains the "SPARC16: A new compression approach for the SPARC architecture" but it is old and does not include newer ISAs like AArch64 and RISC-V.
 SPARC16: A new compression approach for the SPARC architecture https://www.researchgate.net/publication/221306454_SPARC16_A_new_compression_approach_for_the_SPARC_architecture
Neither does it include the ARC architecture. The "High-Performance Extendable Instruction Set Computing" study does where ARC has the 3rd worst code density closely behind Alpha and PA-RISC. ARC went from 3rd worst to 2nd best code density in different studies. This could be explained by flexible configurations including the number of GP registers or major changes to the ARC ISA. One of the most recent major changes to the ARC ISA is, "Synopsys ARC-V Processor IP is based on the open standard RISC-V instruction set architecture (ISA), extending the current ARC portfolio and giving customers access to the growing RISC-V ecosystem."
https://www.synopsys.com/designware-ip/processor-solutions/arc-v-processors.html
Most small embedded processors are configurable including the CPU ISA which is the market ARM served with their a la carte Thumb processors but now is available for just the lowest end Cortex-M processors while forcing higher end embedded ARM cores to scale up to a standard 64-bit AArch64 ISA and fat 64-bit Cortex-A cores. Configurable RISC-V is filling the gap ARM is leaving but there is an advantage to standard hardware, even at a low level. The 68k Amiga standards are a good example of standard hardware with a tiny footprint that supported the development of a large software library. There has not been standard embedded hardware like it until RPi hardware, perhaps from Eben Upton remembering his 68k Amiga and what was possible with standard hardware with a tiny footprint. Standard hardware allows economies of scale to reduce the cost even though ARM with Linux could not scale down nearly as far as the 68k Amiga footprint.
cdimauro Quote:
Indeed. The Amiga OS is very very efficient when talking about memory consumption, and we know it very well.
|
As tiny as the 68k AmigaOS footprint is compared to other modern standard hardware, the non idle AmigaOS footprint could have been improved further with a larger ROM. The 68k Amiga started with a small 256kiB ROM, moved up to a 512kiB ROM which was the standard for a long time and the CD32 ended with a 1MiB ROM. The 512kiB Amiga 4000T ROM was already not large enough and the current degradation of the 68k AmigaOS back to the 68000 results in modules placed on disk and using more memory. Dave Haynie mentioned in a video experimenting with NOR flash (replacing ROM) which could have reduced both the idle and non idle footprint. Modules could have been bug fixed in NOR flash saving the memory to build the LVOs in memory although this would be incompatible at this point (the LVOs could have resided in NOR flash with the code). ROMs were mostly replaced by NOR flash but it does not scale below about a 40nm chip fab process so alternatives are being considered today. For performance, it may be better to load all modules into memory allowing more PC relative addressing and the 68k Amiga would still have a very small footprint.
cdimauro Quote:
That's something which I don't get: with much more registers available on PowerPCs, you should have required LESS stack storage compared to equivalent 68k applications.
Unless there's something in the ABI which requires additional storage (like some allocated memory area of fixed/minimum size in the stack).
|
PPC has more registers to store and load when necessary which is more common than may be expected with 32 GP registers but only roughly half are of the callee/caller saved type needed and the prologue/epilogue register saving and restoring likely saves extra registers without a standard PPC load/store multiple instruction (like the 68k MOVEM instruction). The PPC stack frames are ABI required and use extra stack space. PPC stack alignment ABI requirements increase stack usage. PPC AmigaOS 4 has greatly expanded the use of tags/taglists which are var args built in memory, usually on the stack. Maybe there are other reasons too as this still does not explain a more than 10x increase in stack usage. It appears that PPC memory traffic for the stack would exceed that of the 68k and PPC has a reg arg ABI for function calls while the most common 68k ABI has a stack based arg ABI (Amiga library function calls pass args in registers though).
cdimauro Quote:
MMUs affect the overall memory footprint, but certainly not the stack usage.
If the default stack is 4KB on the Amiga OS and AmigaOS4 sets a 4KB memory page granularity on PowerPCs, then there's absolutely no difference on the memory required for the stack.
|
One byte over the default 4kiB stack size would require 8kiB of stack and an extra read only page may be placed below the stack to detect stack overflows. There would be some extra stack usage if using MMU pages for the stack but the benefit of detecting stack overflows would improve system reliability and may be worth it. It could be possible to disable MMU using stack overflow and extension features to save memory. Some processes/tasks have well understood and minimal stack usage allowing the stack size to be reduced under 1kiB while others allocate a variable amount of stack data based on data loaded with no way to know the maximum stack usage ahead of time.
cdimauro Quote:
On 64-bit systems the stack memory likely requires to be raised to 8KB because of the double registers/pointers sizes and/or for keeping the stack aligned to 64-bit, but here we're talking about 32-bit applications.
|
We are comparing 64-bit vs 32-bit memory usage including the stack. The 64-bit stack usage may double with 64-bit pointers but doubling the registers could cause the stack usage to quadruple. For example, storing all the registers to the stack uses quadruple the stack space.
32-bit x86: 8x32b = 256b 64-bit x86-64: 16x64b = 1024b
32-bit AArc32: 16x32b = 512b 64-bit AArch64: 32x64b = 2048b
I expect doubling the stack size for 64-bit would be adequate for most programs but it may not be enough for worst case stack usage.
cdimauro Quote:
BTW, the Windows and, especially, Linux requirements are so high. Very very strange...
|
The default stack size is crazy large, especially where the MMU can be used for stack overflow detection and extension. Maybe they are using large MMU pages for the stack to improve performance but the performances degrades if the system is running low on memory and paging.
cdimauro Quote:
Right. There's a consistent increase in the memory usage on 64-bit systems. Which is more clearly shown by checking the results for the x32 system (e.g.: 32/64 in the chart), since this ABI is using 32-bit pointers (not for pushes & calls, if I recall correctly, since 64-bit results are used in those cases).
That's interesting, because it makes a stronger case for 64-bit architectures which support 32-bit (or even less) size for pointers.
|
Strangely, the X32 ABI was more popular than the AArch64 ILP32 ABI also using 32-bit pointers. I would have thought the embedded market would be interested in the AArch64 ILP32 ABI to save memory lowering costs and improve performance. The problem may be poor support from ARM and Linux developers who support and encourage fat desktop ARM hardware when embedded ARM hardware is more popular. Much of Linux development has dropped 32-bit support as it is often slower even though AArch64 ILP32 ABI should have improved performance over the AArc64 LP64 ABI much like the X32 ABI offers improved performance over the x86-64 ABI.
cdimauro Quote:
Strange. Doubling the pointer sizes shouldn't bring to such results. There should be some other factor which is influencing them.
|
It is a 1GiB 64-bit system so I suspect it started paging when low on memory. Nearly half the memory is gone to boot Ubuntu on a 1GiB 64-bit system.
cdimauro Quote:
I agree. To me doesn't make sense to switch everything to 64-bit, and the RPi situation is a clear example of too fast decisions made without taking into account all factors.
If you've 8 (or even 16GB), requiring all applications to be 64-bit means just wasting memory without any reason.
Even considering browsers, which are resources hogs nowadays, you don't need that a single process (tab in the browser) uses a 64-bit address space. I'm pretty sure that the most complex web page/application can be fine by using at most 4GB of space (I mean: max 4GB per each opened tab).
That's the reason why I advocate 64-bit systems which support a 32-bit model ("medium") for almost all applications, leaving the 64-bit model ("large") only for applications which really need to handle / access more than 4GB.
|
I expect memory hog web browsers have encouraged the Linux emphasis on 64-bit but part of the problem is that Linux wastes so much memory just to boot up with a GUI.
|
| Status: Offline |
| | kskvgjebnpc
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jun-2025 3:05:31
| | [ #289 ] |
| |
 |
New Member |
Joined: 22-May-2025 Posts: 2
From: Unknown | | |
|
| ĺŠ ç†±čŹ¸č‡ĺ‚łçµ±é¦™čŹ¸ĺś¨ä˝żç”¨ć–ąĺĽŹä¸Šćś‰ćś¬čłŞä¸ŤĺŚă€‚ĺŠ ç†±ç…™ä¸Ťč˝ç›´ćŽĄé»žç«ç‡ç‡’ĺ¸ç”¨ďĽŚć—Ąćś¬é›»ĺ煙依賴特定的電ĺčŁťç˝®äľ†ĺŠ ç†±čŹ¸čŤ‰ćź±ă€‚é€™ç¨®č¨č¨éżĺ…Ťäş†ĺ‚łçµ±é¦™čʏç‡ç‡’時產生的é«ćş«ĺ’Śĺ¤§é‡Źćś‰ĺ®łç‰©čłŞďĽŚĺ¦‚焦油和一氧化碳。
ĺŠ ç†±čŹ¸ĺ˝ĺż…é é…Ťĺĺ°é–€çš„é›»ĺč¨ĺ‚™ä˝żç”¨ďĽŚé€™äş›č¨ĺ‚™é€šéŽç˛ľç˘şćާĺ¶ĺŠ ç†±ćş«ĺş¦ďĽŚé€šĺ¸¸ĺś¨350度左右,使TEREA口味菸草ćĺ†č’¸ç™ĽďĽŚç”˘ç”źç…™éś§äľ›ç”¨ć¶ĺ¸ĺ…Ąă€‚é€™ç¨®ĺŠ ç†±ä¸Ťç‡ç‡’的方式旨在ćŹäľ›ć›´ć¸…潔的ĺ¸čŹ¸é«”é©—ă€‚
ä˝żç”¨ĺŠ ç†±ç…™ć™‚ďĽŚĺ‹™ĺż…ç˘şäżťä˝żç”¨č‡č¨ĺ‚™ĺŚąé…Ťçš„ćŚ‡ĺ®šĺŠ ç†±čŹ¸čŤ‰ç”˘ĺ“。日本煙不é©ĺĺś¨ĺŠ ç†±ç…™çš„é›»ĺ裝置ä¸ä˝żç”¨ďĽŚĺ› 為ĺ®ĺ€‘çš„č¨č¨ĺ’Śćĺ†č‡ĺŠ ç†±ç…™ä¸ŤĺŚďĽŚĺ°Žč‡´č¨ĺ‚™ćŤĺŁžć–影響使用ć•果。推薦閱讀:IQOSĺŠ ç†±čŹ¸ć€ŽéşĽç”¨
ĺŠ ç†±ç…™ćŹäľ›äş†č‡ĺ‚łçµ±é¦™čŹ¸ä¸ŤĺŚçš„使用體驗,通éŽé›»ĺčŁťç˝®ĺŠ ç†±č€Śéťžç‡ç‡’菸草,減少了一些有害物質的產生。ćŁç˘şä˝żç”¨TEREA及其配套č¨ĺ‚™ďĽŚĺ°Ťć–ĽçŤ˛ĺľ—最佳體驗和保č·č¨ĺ‚™č‡łé—śé‡Ťč¦ă€‚了解這些基本資訊,有助於用ć¶ĺ®‰ĺ…¨ă€ćś‰ć•地使用IQOS產ĺ“。 |
| Status: Offline |
| | cdimauro
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jun-2025 5:00:07
| | [ #290 ] |
| |
 |
Elite Member  |
Joined: 29-Oct-2012 Posts: 4444
From: Germany | | |
|
| @matthey
Quote:
matthey wrote: cdimauro Quote:
I agree. Those results are a big surprise (especially considering the ones for the ARC architecture) and they should be taken with a grain of salt.
What I don't like is that the guy hasn't shared all compilation options that he used, because some were tweaked by his admissions. So, it's difficult to reproduce the results or make some changes (like removing frames) to make a fair comparison (or trying different compilers, like LLVM).
|
The graphed data here is from the data you gave in post #5 of this thread. I found the graphed data and thought it would be nice to include it in the thread, even though I agree the data is not the best for comparing code density. |
Yes, I know. I've included it on #5 because I'm collecting there all data which I've found regarding code density: there's always something interesting that might pop-up, since different tests might bring (very) different results, as this benchmark has shown. Quote:
I agree. Those results reflect similar trends on other code density benchmarks. But, as you've said, it's old (and I don't even recall if "Thumb" is just Thumb or it's Thumb-2). Quote:
The "High-Performance Extendable Instruction Set Computing" study does where ARC has the 3rd worst code density closely behind Alpha and PA-RISC. ARC went from 3rd worst to 2nd best code density in different studies. This could be explained by flexible configurations including the number of GP registers or major changes to the ARC ISA. One of the most recent major changes to the ARC ISA is, "Synopsys ARC-V Processor IP is based on the open standard RISC-V instruction set architecture (ISA), extending the current ARC portfolio and giving customers access to the growing RISC-V ecosystem."
https://www.synopsys.com/designware-ip/processor-solutions/arc-v-processors.html |
I don't expect that such good ARC results were coming from the RISC-V version, because we've seen that this ISA isn't so good about code density.
The only way to clarify it is checking the situation trying to reproduce the buildroot, at least for ARC, and check the produced object code. Quote:
cdimauro Quote:
Indeed. The Amiga OS is very very efficient when talking about memory consumption, and we know it very well.
|
As tiny as the 68k AmigaOS footprint is compared to other modern standard hardware, the non idle AmigaOS footprint could have been improved further with a larger ROM. The 68k Amiga started with a small 256kiB ROM, moved up to a 512kiB ROM which was the standard for a long time and the CD32 ended with a 1MiB ROM. The 512kiB Amiga 4000T ROM was already not large enough and the current degradation of the 68k AmigaOS back to the 68000 results in modules placed on disk and using more memory. Dave Haynie mentioned in a video experimenting with NOR flash (replacing ROM) which could have reduced both the idle and non idle footprint. Modules could have been bug fixed in NOR flash saving the memory to build the LVOs in memory although this would be incompatible at this point (the LVOs could have resided in NOR flash with the code). ROMs were mostly replaced by NOR flash but it does not scale below about a 40nm chip fab process so alternatives are being considered today. For performance, it may be better to load all modules into memory allowing more PC relative addressing and the 68k Amiga would still have a very small footprint. |
Nowadays it doesn't matter: all Amiga OS could stay on persistent storage (hard disk, SSD, Flash) and loaded in memory at the boot time. So, I don't bother about NOR systems. For embedded systems, we've plenty of space on Flash memory, which should perfectly fit the scope.
But I don't agree about putting the LVOs in flash: the Amiga OS engineers already did dirty things by internally calling library functions WITHOUT using the LVOs, to squeeze the most space possible. And this is dirty because there's a "contract" to be respected, which is SetFunction, as I've told them the last year on a FB discussion (on the last article which I've published). Quote:
cdimauro Quote:
That's something which I don't get: with much more registers available on PowerPCs, you should have required LESS stack storage compared to equivalent 68k applications.
Unless there's something in the ABI which requires additional storage (like some allocated memory area of fixed/minimum size in the stack).
|
PPC has more registers to store and load when necessary which is more common than may be expected with 32 GP registers but only roughly half are of the callee/caller saved type needed and the prologue/epilogue register saving and restoring likely saves extra registers without a standard PPC load/store multiple instruction (like the 68k MOVEM instruction). |
PowerPCs have load/store multiple registers instructions, so that's not the case. Quote:
The PPC stack frames are ABI required and use extra stack space. PPC stack alignment ABI requirements increase stack usage. PPC AmigaOS 4 has greatly expanded the use of tags/taglists which are var args built in memory, usually on the stack. Maybe there are other reasons too as this still does not explain a more than 10x increase in stack usage. It appears that PPC memory traffic for the stack would exceed that of the 68k and PPC has a reg arg ABI for function calls while the most common 68k ABI has a stack based arg ABI (Amiga library function calls pass args in registers though). |
There should be other reasons, because passing the args to the stack due to the 68k ABI should require more or less the same space as passing them in regs (PowerPC ABI) but saving the old values in the stack and restoring them back.
What might influence here is the "RISC-factor" using many more registers inside a function to accomplish the same task, whereas on the 68k you can use immediates and instructions can directly access memory, which requires many less registers to be used. Quote:
cdimauro Quote:
MMUs affect the overall memory footprint, but certainly not the stack usage.
If the default stack is 4KB on the Amiga OS and AmigaOS4 sets a 4KB memory page granularity on PowerPCs, then there's absolutely no difference on the memory required for the stack.
|
One byte over the default 4kiB stack size would require 8kiB of stack and an extra read only page may be placed below the stack to detect stack overflows. There would be some extra stack usage if using MMU pages for the stack but the benefit of detecting stack overflows would improve system reliability and may be worth it. It could be possible to disable MMU using stack overflow and extension features to save memory. Some processes/tasks have well understood and minimal stack usage allowing the stack size to be reduced under 1kiB while others allocate a variable amount of stack data based on data loaded with no way to know the maximum stack usage ahead of time. |
A guard page to detect stack underflow is perfectly fine. But it's always ONE page, regardless of the allocated stack.  Quote:
cdimauro Quote:
On 64-bit systems the stack memory likely requires to be raised to 8KB because of the double registers/pointers sizes and/or for keeping the stack aligned to 64-bit, but here we're talking about 32-bit applications.
|
We are comparing 64-bit vs 32-bit memory usage including the stack. The 64-bit stack usage may double with 64-bit pointers but doubling the registers could cause the stack usage to quadruple. For example, storing all the registers to the stack uses quadruple the stack space.
32-bit x86: 8x32b = 256b 64-bit x86-64: 16x64b = 1024b
32-bit AArc32: 16x32b = 512b 64-bit AArch64: 32x64b = 2048b
I expect doubling the stack size for 64-bit would be adequate for most programs but it may not be enough for worst case stack usage. |
I don't expect much stack increase here only for that, because most of the functions aren't using many args. So, the register usage only for passing the args to functions should not change so much when you already start with 16 registers.
There are cases where many args are passed, but those are typical "exceptions to the rule", and they shouldn't influence the general results/trend. Quote:
cdimauro Quote:
BTW, the Windows and, especially, Linux requirements are so high. Very very strange...
|
The default stack size is crazy large, especially where the MMU can be used for stack overflow detection and extension. Maybe they are using large MMU pages for the stack to improve performance but the performances degrades if the system is running low on memory and paging. |
No, large pages aren't normally used on a modern OS, unless the application "requires it".
And it would have been a very dumb decision for an OS which is running on a system with little available memory.
Probably the memory is just reserved on the address space, but not effectively allocated (so, it will not be paged out in case of low memory available), and will be (only) allocated when needed. Just a guess... Quote:
cdimauro Quote:
Right. There's a consistent increase in the memory usage on 64-bit systems. Which is more clearly shown by checking the results for the x32 system (e.g.: 32/64 in the chart), since this ABI is using 32-bit pointers (not for pushes & calls, if I recall correctly, since 64-bit results are used in those cases).
That's interesting, because it makes a stronger case for 64-bit architectures which support 32-bit (or even less) size for pointers.
|
Strangely, the X32 ABI was more popular than the AArch64 ILP32 ABI also using 32-bit pointers. I would have thought the embedded market would be interested in the AArch64 ILP32 ABI to save memory lowering costs and improve performance. The problem may be poor support from ARM and Linux developers who support and encourage fat desktop ARM hardware when embedded ARM hardware is more popular. Much of Linux development has dropped 32-bit support as it is often slower even though AArch64 ILP32 ABI should have improved performance over the AArc64 LP64 ABI much like the X32 ABI offers improved performance over the x86-64 ABI. |
Likely. The developers of OS & libraries preferred to drop supporting x32 "simply" (!) because they don't want to pay for this "burden", and AArch64 ILP32 is probably facing the same.
I can understand that support is a cost, but that it's a well spent cost, especially on systems with limited available resources. |
| Status: Offline |
| | michalsc
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jun-2025 6:09:17
| | [ #291 ] |
| |
 |
AROS Core Developer  |
Joined: 14-Jun-2005 Posts: 440
From: Germany | | |
|
| @matthey
Quote:
Strangely, the X32 ABI was more popular than the AArch64 ILP32 ABI also using 32-bit pointers. I would have thought the embedded market would be interested in the AArch64 ILP32 ABI to save memory lowering costs and improve performance. The problem may be poor support from ARM and Linux developers who support and encourage fat desktop ARM hardware when embedded ARM hardware is more popular. Much of Linux development has dropped 32-bit support as it is often slower even though AArch64 ILP32 ABI should have improved performance over the AArc64 LP64 ABI much like the X32 ABI offers improved performance over the x86-64 ABI. |
Maybe they are not interested because ARM v8-M (microcontroller profile - embedded) is not 64 bit but a variant of Thumb32... Just saying.
It is only ARM v8-A (application profile) and ARM v8-R (real-time profile) that support AArch64. |
| Status: Offline |
| | OneTimer1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 23-Jun-2025 11:06:21
| | [ #292 ] |
| |
 |
Super Member  |
Joined: 3-Aug-2015 Posts: 1263
From: Germany | | |
|
| @matthey Quote:
Strangely, the X32 ABI was more popular than the AArch64 ILP32 ABI also using 32-bit pointers. |
I made my own tests on RPi4 under Linux and 32-bit applications where 1-3% faster than 64-bit. So as long you don't need them, stay close to 32-bit and 2-GB Ram is a lot for many embedded applications.
|
| Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jun-2025 0:23:24
| | [ #293 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2756
From: Kansas | | |
|
| cdimauro Quote:
I agree. Those results reflect similar trends on other code density benchmarks. But, as you've said, it's old (and I don't even recall if "Thumb" is just Thumb or it's Thumb-2).
|
Thumb-2 was released early enough for the SPARC16 paper and "arm(thumb)" likely represents the Thumb modes with the compiler choosing between Thumb and Thumb-2.
SPARC16: A new compression approach for the SPARC architecture https://www.researchgate.net/publication/221306454_SPARC16_A_new_compression_approach_for_the_SPARC_architecture Quote:
First presented by ARM on its ARM7 model, the next 16 bits processor extension in the market was Thumb. Thumb enabled ARM processors are capable of running code in both 32 and 16 bits modes and allow subroutines of both types to share the same address space, while the mode exchange is achieved during runtime through BX and BLX instructions, which are branch and call instructions that flip the current mode bit in a special processor register. To fit functionality in 16 bits, a group of only 8 registers together with a stack pointer and link registers are visible, the remaining registers can only be accessed implicitly or through special instructions. Results presented by ARM show a compression ratio ranging from 55% to 70%, with an overall performance gain of 30% for 16 bit buses and 10% loss for 32 bit ones. Thumb2 is the recent version of the original Thumb incremented with new features like the addition of specific instructions for operating system usage.
|
Thumb-2 would be chosen most of the time for performance but the original Thumb ISA often has similar and sometimes better code density. Only 8 GP registers causing increased memory traffic is mentioned for Thumb while the increase in instructions executed of 30% or more compared to the original ARM ISA is not mentioned. The Thumb-2 advantages of avoiding performance killing instruction pipeline flushing mode switches, being able to access all 16 GP registers with 32-bit instructions and being able to encode larger immediates and displacements in 32-bit instructions are not mentioned or the resulting decrease in instructions executed and memory traffic. Thumb-2 was a big improvement over Thumb which a code density study alone does not show. Code density should be studied with performance metrics/traits but it is more difficult.
cdimauro Quote:
I don't expect that such good ARC results were coming from the RISC-V version, because we've seen that this ISA isn't so good about code density.
The only way to clarify it is checking the situation trying to reproduce the buildroot, at least for ARC, and check the produced object code.
|
It is possible that ARC-V standardizes more of the optional RISC-V ISA extensions which provide better code density than the RV32GC compiler target although the ~13% code density improvement for ARC is more than I would expect. If ARC was using an improved ISA that competed with ARM Thumb ISAs, then it is interesting that they would give it up for RISC-V. RISC-V compressed code density competitiveness was better in this code density comparison than most others I have seen outside of RISC-V promoters. One thing RISC-V does not suffer from is lack of GP registers and the code density is good for the number of GP registers. It is the reduced instruction set "RISC" instructions and addressing modes which made the ISA weak. There are new ISA extensions to try to fix the known problems but will they become the Cortex-A like pseudo standard as the compressed extension became and will some of the handicap remain?
cdimauro Quote:
Nowadays it doesn't matter: all Amiga OS could stay on persistent storage (hard disk, SSD, Flash) and loaded in memory at the boot time. So, I don't bother about NOR systems. For embedded systems, we've plenty of space on Flash memory, which should perfectly fit the scope.
But I don't agree about putting the LVOs in flash: the Amiga OS engineers already did dirty things by internally calling library functions WITHOUT using the LVOs, to squeeze the most space possible. And this is dirty because there's a "contract" to be respected, which is SetFunction, as I've told them the last year on a FB discussion (on the last article which I've published).
|
Loading/decompressing libraries from NAND flash storage into memory has several advantages as a trade off to the increased memory footprint.
1. NAND flash is cheaper, higher performance and scales down better than NOR flash 2. good compatibility and upgradability can be maintained 3. PC relative libraries that do not need the library base in the a6 register could be used
Compatible PC relative libraries are possible right now for storage loaded Amiga libraries. This is possible by merging the whole library together in memory including the code, data, and LVO. A 68k CPU can not do a PC relative write/store so a LEA would be necessary first but this is low overhead and I believe better than maintaining the library base in the a6 register when many functions do not use it and it creates problems for 68k compiler support (most other 68k compiler targets default to a6 as the frame pointer). The (d16,PC) addressing mode would work for small libraries but a shorter encoding for (d32,PC) would improve the efficiency for larger libraries. PC relative writes/stores should be considered for a 68k64 ISA which could further improve efficiency.
With the introduction of PC relative libraries, new Amiga code which reallocates a6 for other purposes would not work with old libraries but old code would be compatible with the the new libraries. PC relative libraries would reduce how low a 68k system can scale in footprint as a NOR flash Kickstart would use less memory but it would not give up much considering NOR flash is not scaling below about a 40nm chip fab process and requires two dies as the RP2354 stacked die packages demonstrate.
https://en.wikipedia.org/wiki/RP2350
The 68k Amiga tiny footprint is a significant advantage but there are other costs that are higher than the memory cost.
cdimauro Quote:
PowerPCs have load/store multiple registers instructions, so that's not the case.
|
The PPC ISA has load/store multiple register instructions but the standard does not require them to be implemented in hardware which is why prologues/epilogues are the standard to save and restore GP registers. Also, the PPC STMW/LMW instructions only store and load multiple consecutive instructions where the 68k MOVEM loads and stores nonconsecutive registers from a list. Many unnecessary registers are stored to the stack compared to the 68k whether PPC STM/LMW instructions or the prologue/epilogue method is used because they both only access consecutive GP registers.
Power Architecture 32-bit Application Binary Interface Supplement 1.0 - Linux & Embedded https://example61560.wordpress.com/wp-content/uploads/2016/11/powerpc_abi.pdf Quote:
_save32gpr_14: stw r14,-72(r11) _save32gpr_15: stw r15,-68(r11) ... _save32gpr_30: stw r30,-8(r11) _save32gpr_31: stw r31,-4(r11) blr
|
There is an example with vector registers on page 66 which require an ADDI instruction for each GP register stored. This is why PPC stack sizes and memory traffic are crazy and it does not even consider the extra loop unrolling and function inlining required by many RISC CPU core implementations for good performance.
cdimauro Quote:
There should be other reasons, because passing the args to the stack due to the 68k ABI should require more or less the same space as passing them in regs (PowerPC ABI) but saving the old values in the stack and restoring them back.
What might influence here is the "RISC-factor" using many more registers inside a function to accomplish the same task, whereas on the 68k you can use immediates and instructions can directly access memory, which requires many less registers to be used.
|
No doubt the "RISC-factor" has synergies to bloat PPC programs. Once going down the fat everything path the snowball grows. The PPC philosophy is practically the opposite of the 68k philosophy of minimization of code size, minimization of memory traffic and good code sharing support with PC relative addressing. The 68k AmigaOS extends the philosophy and elegance while the PPC AmigaOS CPU ISA does not, at least if wanting to retain the small footprint advantage of the 68k AmigaOS. Hyperion had plans to enter the embedded market with the PPC AmigaOS 4 but when the EfikaPPC with 128MiB of memory was not enough, it should have been clear that the replacement CPU ISA was chosen poorly for the AmigaOS and embedded market.
cdimauro Quote:
A guard page to detect stack underflow is perfectly fine. But it's always ONE page, regardless of the allocated stack. 
|
The stack guard page could be eliminated with stack limit checking as newer ARM Cortex-M cores use.
TRUSTZONE TECHNOLOGY 04_LPC5500_TrustZone_v1.4.pdf Quote:
Stack limit checking
• As part of ARM TrustZone technology for ARMv8-M, there is also a stack limit checking feature. For ARMv8-M Mainline, all stack pointers have corresponding stack limit registers.
|
Some Cortex-M features are limited instead of dynamic like Cortex-A features though.
cdimauro Quote:
No, large pages aren't normally used on a modern OS, unless the application "requires it".
And it would have been a very dumb decision for an OS which is running on a system with little available memory.
Probably the memory is just reserved on the address space, but not effectively allocated (so, it will not be paged out in case of low memory available), and will be (only) allocated when needed. Just a guess...
|
I think you are on the right track but I would not rule out large MMU page sizes for the stack. A large MMU page size for the stack reduces TLB misses which improves performance. The large Linux stack size may be a virtual memory size that is only partially allocated but allocating more memory on demand is also bad for performance. Compilers bloat up code for minimal performance gains and Linux OSs can be expected to do the same. The problem is that when everything is bloated up for minimal performance gains, low memory paging may result in a major performance loss. It is kind of like the speed launchers that many Windows programs want to add into startup but they slow down startup and overall performance with too many installed.
cdimauro Quote:
Likely. The developers of OS & libraries preferred to drop supporting x32 "simply" (!) because they don't want to pay for this "burden", and AArch64 ILP32 is probably facing the same.
I can understand that support is a cost, but that it's a well spent cost, especially on systems with limited available resources.
|
Yes. The following is a thread suggesting x32 ABI support should be removed in Linux.
https://lwn.net/ml/linux-kernel/CALCETrXoRAibsbWa9nfbDrt0iEuebMnCMhSFg-d9W-J2g8mDjw@mail.gmail.com/
Linus Torvald responds.
Linus Torvalds Quote:
Andy Lutomirski Quote:
> > I'm seriously considering sending a patch to remove x32 support from > upstream Linux. Here are some problems with it:
|
I talked to Arnd (I think - we were talking about all the crazy ABI's, but maybe it was with somebody else) about exactly this in Edinburgh.
Apparently the main real use case is for extreme benchmarking. It's the only use-case where the complexity of maintaining a whole development environment and distro is worth it, it seems. Apparently a number of Spec submissions have been done with the x32 model.
I'm not opposed to trying to sunset the support, but let's see who complains..
Linus
|
I think it would have been easier to maintain x32 if the LLP64 data model had been chosen for Linux like Windows instead of the LP64 data model which also reduces the memory footprint a little. Changing existing datatype sizes (32-bit long to 64-bit long) is a pain and it is better to define and use new ones like 64-bit long long.
https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
If new drivers are not being supported for the x32 ABI, it is difficult to maintain support but support often depends on the attitude of the developers providing the support.
michalsc Quote:
Maybe they are not interested because ARM v8-M (microcontroller profile - embedded) is not 64 bit but a variant of Thumb32... Just saying.
It is only ARM v8-A (application profile) and ARM v8-R (real-time profile) that support AArch64.
|
Cortex-M cores at the low end are MCU only and do not support all of Thumb and Thumb-2 ISAs limiting potential OS support. Standard hardware has advantages even at the low end which RPi understands. They upgraded from the RP2040 Cortex-M0+ to RP2350 Cortex-M33 gaining full Thumb(-2) support and better backward compatibility from Cortex-A cores in RPi hardware. There are 3 Cortex-M cores above this (Cortex-M52, Cortex-M55 and Cortex-M85) which support optional instruction/data caches for other than SRAM memory, up to a 7-stage integer instruction pipeline and a FPU so it may be possible to configure a 32-bit Thumb compatible Cortex-M85 that is mostly compatible with the 32-bit support of Cortex-A cores that is being deprecated but full compatible features and performance are likely not possible. I would be surprised to see the existing 32-bit RPi OS on any Cortex-M core hardware. Maybe they can develop a cut down version of the RPi OS for their new Cortex-M standard hardware but ARM is not making it easy for them by removing 32-bit support from Cortex-A cores. The 68k AmigaOS scales to the current RP2040 memory footprint and would be usable by moving more of the AmigaOS into NOR flash but changing from the 68k+chipset to ARM+chipset loses the huge advantage of standard hardware with a large software library for such a small footprint.
OneTimer1 Quote:
I made my own tests on RPi4 under Linux and 32-bit applications where 1-3% faster than 64-bit. So as long you don't need them, stay close to 32-bit and 2-GB Ram is a lot for many embedded applications.
|
So with a 2GiB memory RPi4 system, 32-bit Thumb(-2) still offers 1-3% better performance compared to 64-bit AArc64?
Too bad there is not an easy option to use the hardware upgrade better with 32-bit pointers like x32.
https://en.wikipedia.org/wiki/X32_ABI#Details Quote:
Though the x32 ABI limits the program to a virtual address space of 4 GiB, it also decreases the memory footprint of the program by making pointers smaller. This can allow it to run faster by fitting more code and more data into cache. The best results during testing were with the 181.mcf SPEC CPU 2000 benchmark, in which the x32 ABI version was 40% faster than the x86-64 version. On average, x32 is 5–8% faster on the SPEC CPU integer benchmarks compared to x86-64. There is no speed advantage over x86-64 in the SPEC CPU floating-point benchmarks. There are also some application benchmarks that demonstrate the advantages of the x32 ABI.
|
https://wiki.debian.org/X32Port Quote:
X32Port
X32 is an ABI for amd64/x86_64 CPUs using 32-bit integers, longs and pointers. The purpose is to combine the smaller memory and cache footprint from 32-bit data types with the larger register set of x86_64.
There are three principal use cases:
o vserver hosting (memory bound) o netbooks/tablets (low memory, want performance) o scientific tasks (want every % of performance)
Compared to amd64, x32 offers significant memory savings, often on the order of 30%, and modest efficiency gains. The 64-bit registers can make computation more efficient. Since 8 additional registers available there is less pressure compared to i386/i686.
Compared to i386, speed increases are more pronounced, especially in code that's under register pressure or operates on 64-bit or floating-point variables. It also avoids i386's penalty for PIC code, where EBX is essentially reserved for the Global Offset Table (GOT).
|
Thumb-2 is actually less handicapped by the lack of GP registers than x86 which really only has 6 usable orthogonal GP registers for programs to use (EBX and ESP are not general purpose and lack orthogonality). ARM 32-bit loses the PC and LR registers so really only has 14 GP registers compared to the 68k 16 GP registers which moves the PC out of orthogonal encodings and does not have a LR register. The 68k SP is mostly orthogonal except for 8-bit stores which are padded to 16-bit. The 68k has better PC relative addressing support than x86 too. The 68k ISA is not as handicapped as x86 and Thumb-2 ISAs.
Last edited by matthey on 29-Jun-2025 at 03:33 PM. Last edited by matthey on 24-Jun-2025 at 12:43 AM.
|
| Status: Offline |
| | michalsc
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jun-2025 6:21:29
| | [ #294 ] |
| |
 |
AROS Core Developer  |
Joined: 14-Jun-2005 Posts: 440
From: Germany | | |
|
| @matthey
Quote:
Changing existing datatype sizes (32-bit long to 64-bit long) is a pain and it is better to define and use new ones like 64-bit long long. |
For quite a long time we have architecture-agnostic fixed size data types, not using them is source of trouble when switching between architectures.
Quote:
Cortex-M cores at the low end are MCU only and do not support all of Thumb and Thumb-2 ISAs |
That's why I wrote "variant of Thumb32". Besides, in case of embedded field you do not really care if this is a subset of Thumb32, or full T32 since you build the code base matching the very product you build. In such cases either you take e.g. M-profile CPUs which are subset of T32 with good code density, or A-/R- profile CPU and you put amount of memory you need. Thanks to proper testing you assure that the embedded system will not run out of memory anyway, so, who cares?
Quote:
I would be surprised to see the existing 32-bit RPi OS on any Cortex-M core hardware |
Why on the earth you would like to run desktop OS on CPU for embedded purposes?
Quote:
Maybe they can develop a cut down version of the RPi OS for their new Cortex-M standard hardware but ARM is not making it easy for them by removing 32-bit support from Cortex-A cores. |
I have a feeling you are awfully mixing things... |
| Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 24-Jun-2025 15:47:57
| | [ #295 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2756
From: Kansas | | |
|
| michalsc Quote:
For quite a long time we have architecture-agnostic fixed size data types, not using them is source of trouble when switching between architectures.
|
The new for C99 integer datatypes are useful and can eliminate variable size integer problems. Much of Linux and some other code was written before C99 and never updated. Careful programming should be able to avoid problems without C99 integer datatypes but there are nuances and higher maintenance is required. There would have been less difference between x32 and x86-64 if long integers had remained the same for Linux. The pointer size change is necessary but none of the other datatypes needed to change.
michalsc Quote:
That's why I wrote "variant of Thumb32". Besides, in case of embedded field you do not really care if this is a subset of Thumb32, or full T32 since you build the code base matching the very product you build. In such cases either you take e.g. M-profile CPUs which are subset of T32 with good code density, or A-/R- profile CPU and you put amount of memory you need. Thanks to proper testing you assure that the embedded system will not run out of memory anyway, so, who cares?
|
Some embedded hardware requires or gains from a la carte customizations. A large and growing percentage of embedded hardware can use standard hardware though. This is because more functionality can be provided practically for free and newer chip fab processes reduce power enough for all but the lowest power applications like battery operated devices. Economies of scale for standard embedded hardware can reduce hardware prices more than limited production custom hardware with fewer features.
michalsc Quote:
Why on the earth you would like to run desktop OS on CPU for embedded purposes?
|
RPi hardware is not desktop hardware. The only desktop ARM hardware is for the Apple Mac. RPi hardware is standard embedded hardware which Linux targets. It is easier to use a standard OS on standard embedded hardware but some OS developers prioritize the desktop at the expense of small footprint embedded hardware. Some compiler developers prioritize the desktop and now even ARM prioritizes the desktop with the deprecation of 32-bit ARM support for Cortex-A cores. The original small footprint standard embedded hardware market which RPi pioneered is in danger of being lost. They have successfully scaled down to 32-bit Cortex-M Thumb MCUs, without standard fat Linux, and are being forced to scale up to 64-bit Cortex-A hardware but there will be a large gap in the RPi product range where the original 32-bit RPi hardware started.
The 68k Amiga standard is perfect for filling the gap being abandoned by ARM 32-bit Cortex-A support. The 68k Amiga originated using 256kiB of memory although 1-2MiB retains a much larger portion of the Amiga software library. I would not scale the 68k Amiga as low as the RP2350 with 2xCortex-M33@150 MHz, 520kiB SRAM and no digital display output. The 68k Amiga strength is with a small footprint GUI at a larger footprint that is still considered tiny compared to a 64-bit ARM Cortex-A footprint. The 68k AmigaOS scales lower than popular standard Linux distributions and leaves more memory available for programs. I believe the 68k can have more single core performance than Cortex-M cores too. The only Cortex-M core which is superscalar is the older 6-stage Cortex-M7 where the newer scalar 7-stage Cortex-M85 with Armv8.1-M support would likely be chosen for performance use. I believe a modernized 8-stage superscalar 68060 would have better performance/MHz and could even compete with at least the Cortex-A55 in performance/MHz judging by the performance of SiFive series 7 cores with a CISC like design similar to the 68060. The 2-way superscalar 32-bit 68060 CPU uses ~2.5 million transistors while the lowest end 64-bit 2-way superscalar Cortex-A53 core uses ~12.5 million transistors. A 32-bit in-order 2-way Cortex-A7 core predecessor uses ~10 million transistors so a 64-bit equivalent Cortex-A53 core uses ~25% more transistors. The 64-bit tax applies to more than just memory.
michalsc Quote:
I have a feeling you are awfully mixing things...
|
Not at all. RPi may never try to provide a standard OS for their MCUs as there are few OSs that scale down that far. They knew Linux was too fat but they did toy with scaling down the RISC OS for the Pico.
https://www.riscosopen.org/news/articles/2014/05/01/happy-birthday-basic Quote:
Happy birthday, BASIC
To celebrate the 50th birthday of the BASIC programming language, we are very pleased to announce the release of “RISC OS Pico” (product discontinued). This is a very cut-down RISC OS for the Raspberry Pi that boots directly into BASIC, just like on the Beeb.
RISC OS Pico is available as a free download that can be installed onto pretty much any SD card, or you can buy it pre-installed from us for just a fiver.
BASIC was first unleashed upon the programming world in 1964, in a version known as Dartmouth BASIC. It became particularly popular in the 1980s when it was adopted as the built-in language of choice for the new wave of home computers. And of course, on Acorn hardware we had BBC BASIC, on the BBC Micro, which was aimed at British schoolchildren. Something which has obvious parallels to the Raspberry Pi today.
BBC BASIC lives on in RISC OS, albeit with many improvements and updates. Find out for yourself with “RISC OS Pico” (product discontinued).
Happy birthday, BASIC – here’s to the next 50 years…
|
Even the AmigaOS may not be possible or useful at the RPi Pico RP2040 MCU spec and it would feel cramped at the RPi Pico 2 RP2350 MCU spec. Doubling the memory footprint again would allow the 68000+OCS/ECS standard with software and doubling that would allow the 68020+AGA standard and software. The Archimedes RISC OS with fat original ARM ISA does not scale as low as the 68k AmigaOS, does not have as much software and the standard ARM hardware including CPU ISA changed. Modern ARM changes have made it difficult for RISC OS to maintain compatibility on modern hardware. The 26-bit addressing hardware support was dropped long ago and more recently the original 32-bit ARM ISA was dropped from Cortex-A cores. Cortex-M cores do not support these features either as embedded hardware does not need standard features or backward compatibility, according to ARM. RPi may want these standard features and backward compatibility so they could use a standard RISC OS on tiny footprint hardware but ARM controls their destiny. Producing standard 68k Amiga small footprint hardware again is not easy but likely possible, standard and backward compatible features can be maintained and the destiny is no longer controlled by a business with the opposite philosophy. The RPi philosophy is to use standard embedded hardware to improve economies of scale while the ARM philosophy is to drop standard and compatibility features to gain every last hardware advantage to compete with x86-64. Ironically, it is the standard hardware with backward compatibility that maintains the huge software advantage for the x86-64 desktop too.
Last edited by matthey on 24-Jun-2025 at 04:24 PM.
|
| Status: Offline |
| | OneTimer1
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 25-Jun-2025 13:03:55
| | [ #296 ] |
| |
 |
Super Member  |
Joined: 3-Aug-2015 Posts: 1263
From: Germany | | |
|
| @matthey
Quote:
matthey wrote:
Not at all. RPi may never try to provide a standard OS for their MCUs as ...
|
Their standard Linux variant is Debian but they also provide other variants like Ubuntu in their website.Last edited by OneTimer1 on 25-Jun-2025 at 07:28 PM. Last edited by OneTimer1 on 25-Jun-2025 at 01:19 PM.
|
| Status: Offline |
| | matthey
|  |
Re: The (Microprocessors) Code Density Hangout Posted on 25-Jun-2025 21:02:07
| | [ #297 ] |
| |
 |
Elite Member  |
Joined: 14-Mar-2007 Posts: 2756
From: Kansas | | |
|
| OneTimer1 Quote:
Their standard Linux variant is Debian but they provide other variants like Ubuntu in their website. |
RPi develops the RPi OS where it likely has minimal involvement in making other RPi images available. My understanding about being based on Debian is the following.
Raspberry Pi OS 32-bit is based on Raspbian Linux which is based on Debian Linux Raspberry Pi OS 64-bit is based on Debian Linux
The Raspberry Pi OS was previously called Raspbian but since the 64-bit version is not based on Raspbian, they renamed both versions Raspberry Pi OS. RPi chose lightweight versions of Linux components to work better on their small footprint hardware. For example, their UI is based on the lightweight LXDE which is called PIXEL (Pi Improved Xwindows Environment, Lightweight). Debian likely has a smaller memory footprint that Ubuntu too.
https://www.reddit.com/r/linux/comments/5l39tz/linux_distros_ram_consumption_comparison_updated/ https://www.androidauthority.com/linux-distro-least-ram-3489365/
Debian using XFCE was the best and Ubuntu using Unity the worst for memory footprint in the comparison of popular Linux distros (1st link above). Debian using Gnome had the 2nd worst memory footprint in the 2nd comparison which included more lightweight distros (2nd link above). Debian based RPi OS and Ubuntu are likely the only Linux variants with more than 5% OS share on RPi hardware.
https://rpi-imager-stats.raspberrypi.com/
I can not explain the difference in memory footprint of the Linux footprint comparisons. The 1st has Debian using 208MiB of memory after boot while the 2nd has it using 1055MiB. I expect the 1st comparison is closer and it states the test was done using 64-bit with 1GIB memory. I found older RPi 2 memory footprint info using 32-bit with 1GiB memory.
https://distrowatch.com/weekly.php?issue=20150622#raspbian Quote:
With the LXDE desktop running, Raspbian required approximately 250MB of memory and, when including cache, typically consumed most of its 1GB of RAM. However, after I removed the LXDE desktop and the X display server, replacing them with ZFS and DenyHosts, Raspbian's memory footprint dropped to 76MB of memory. Including cached data, my Pi is using a mere 155MB of memory. Most of the time the Pi's processor is idle. Early on I tried to put a heavy load on the device's four CPU cores and found Raspbian continued to work smoothly and the device remained cool to the touch. Once my tests were completed, I found Raspbian usually carried a load average of about 0.01 and the Pi responds faster to remote connections than my old single-core backup box did.
|
The original RPi with 256MiB of memory must have been crippled while using a GUI and the scalar ARM11 CPU had anemic performance. The very low price allowed the RPi to succeed and sell ~68 million units. Aggressively pushing down the price and memory footprints of the hardware and OS led to success. Completely ignoring the production of competitive hardware with a small memory footprint led to the PPC AmigaNOne selling maybe 5,000 units even while protecting the Amiga market from competition.
It may be easier to compare non GUI based memory footprints as the GUI desktop varies with screen/window memory used, backdrop memory used, taskbar memory used, icons used and other GUI related programs that may be launched at startup. A 1080P screen (1920x1080) in true color 32-bit would use nearly 8MiB of memory but this still does not explain large Linux footprints. The 32-bit RPi still uses ~76MiB to boot to a CLI. We did not consider the screen when looking at the 68k Amiga footprint.
68k AmigaOS 3 used 54kiB of 2MiB of memory after boot or 55,296B floppy drive defaults to 5x512B buffers using 2,560B Amiga defaults to a 640x200 screen with 4 colors using 32,000B --- ~20,736B used by the AmigaOS excluding floppy drive buffers and screen/window bitmap
The 68k Amiga memory footprint was not even small compared to 8-bit computers with character/tile based graphics but the 68k Amiga used a large flat address space, preemptive multitasking, bitmapped graphics and was much more dynamic than 8-bit systems. How good a memory footprint seems to be relative to what you are used to.
https://www.androidauthority.com/linux-distro-least-ram-3489365/ Quote:
What is the best Linux distro for low RAM setups?
In short, containers are the lightest option if you’re looking for Linux distros that use the least amount of RAM, especially for command-line tasks. Dedicated command-line Linux versions are resource-efficient and ideal for servers. For general users, lightweight desktop interfaces, like XFCE and LXQT, help reduce resource consumption, but applications, especially web browsers, are the primary memory consumers. Nevertheless, it’s impressive that all Linux distros I’ve tested used less than 2GB of RAM, even with the web browser test running.
|
https://www.amigans.net/modules/newbb/viewtopic.php?post_id=152300#forumpost152300 joerg Quote:
Compared to real PPC hardware like the X1000 and X5000, or even just a Sam4x0, running AmigaOS 4.x with QEmu on a Raspberry Pi is unusable slow, however it's a replacement board for the A1200 and even QEmu emulated AmigaOne XE/PPC AmigaOS 4.x on it should be faster than the classic Amiga AmigaOS 4.x on real hardware (A1200 with BlizzardPPC), and with 4GB you have enough RAM for running AmigaOS 4.x software, compared to the BlizzardPPC with max. 256 MB which is way too few. Additionally the RAM access speed on the BlizzardPPC is extremely slow, which was the main reason AmigaOS 4.x was unusable on the A1200/BPPC.
The A600GS makes much more sense.
|
Some people get accustomed to the bloat and forget the past. Eben Upton seems to remember his small memory footprint Amiga 600 even though he was unsuccessful at scaling the RPi below 256MiB of memory and gave that footprint up for 512MiB minimum SBCs that could more comfortably use Linux with a GUI. It was enough for Eben to successfully carve out a market niche selling ~68 million units even though he did not get close to the 68k Amiga memory footprint.
Last edited by matthey on 25-Jun-2025 at 09:20 PM. Last edited by matthey on 25-Jun-2025 at 09:15 PM. Last edited by matthey on 25-Jun-2025 at 09:12 PM.
|
| Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 26-Jun-2025 11:11:22
| | [ #298 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6505
From: Australia | | |
|
| @matthey
Quote:
RPi hardware is not desktop hardware. The only desktop ARM hardware is for the Apple Mac.
|
Qualcomm Snapdragon's X Elite's Oryon is also a desktop target ARM (v8.7-A) compatible.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 26-Jun-2025 11:55:13
| | [ #299 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6505
From: Australia | | |
|
| @matthey
Quote:
Thumb-2 is actually less handicapped by the lack of GP registers than x86 which really only has 6 usable orthogonal GP registers for programs to use (EBX and ESP are not general purpose and lack orthogonality). ARM 32-bit loses the PC and LR registers so really only has 14 GP registers compared to the 68k 16 GP registers which moves the PC out of orthogonal encodings and does not have a LR register. The 68k SP is mostly orthogonal except for 8-bit stores which are padded to 16-bit. The 68k has better PC relative addressing support than x86 too. The 68k ISA is not as handicapped as x86 and Thumb-2 ISAs.
|
1. For the NXP / ST-Micro camp, PPC 16-bit VLE is the main competition against 68K.
2. IA-32's X87 supports integer formats due to 8087's support for INT32 and INT64, in addition to FP32, FP64, and FP80 data formats.
For 8086 and 8088, Intel addressed the 68000's INT32 advantage via 8087's INT32 support.
IA-32 includes 8 X87/MMX and 8 XMM (SSE2) registers. SSE2 supports scalar/vector integers.
Unlike 68060, Pentium guarantees X87's existence.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| | Hammer
 |  |
Re: The (Microprocessors) Code Density Hangout Posted on 26-Jun-2025 12:02:42
| | [ #300 ] |
| |
 |
Elite Member  |
Joined: 9-Mar-2003 Posts: 6505
From: Australia | | |
|
| @cdimauro
Quote:
cdimauro wrote: I've already and IMMEDIATELY reported this news once it was published AND added my comment on that (NOT in favour of Intel), idiot!
|
You're the real idiot.
_________________ Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68) Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68) Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB |
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|