Click Here
home features news forums classifieds faqs links search
6155 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
22 crawler(s) on-line.
 95 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!

/  Forum Index
   /  General Technology (No Console Threads)
      /  The (Microprocessors) Code Density Hangout
Register To Post

Goto page ( 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 Next Page )
PosterThread
cdimauro 
The (Microprocessors) Code Density Hangout
Posted on 1-May-2021 5:39:00
#1 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

Since we talk about code density from time to time, I'd like to open a thread in this section to collect all information, instead of having it spread around in several threads (which also disappear and are difficult to retrieve).

@matthey: I think that you're the most competent and active expert about this topic. Would you like to contribute?

My idea is to create some comments at the beginning of the thread which covers/collect specific topics, which will get updated from time to time. Example:
- General information (what's code density, why it's important, etc.)
- Benchmarks (state of the art)
- Compilers (which ones are best)
- Literature (book, academic papers, web sites)
- Microprocessors (general information like if the ISA is more or less oriented to code density, if it has specific execution modes for compact code, if it has specific extensions for compact code, etc.)
- Motorola 68K corner (anything which is useful about this family, which is not covered by other topics)
- Intel IA-32/x86 / AMD x86-64/x64 corner (...)
- ARM corner (...)
- RISC-V corner (...)
- SuperH / J2 corner (...)
- Embedded corner (...)
- another 5 empty comments for topics which may pop-up in future.

@matthey can you take care of creating and updating those comments? As I said, you're the most competent and also active, so I think that you can contribute much better than me.

A request from my side: please can you share your updated 68K source about this http://deater.net/weave/vmwprod/asm/ll/ll.html ? Looking at your tables in Google Documents the numbers don't match with what mr. Weaver published in that link.

 Status: Offline
Profile     Report this post  
noXLar 
Re: The (Microprocessors) Code Density Hangout
Posted on 1-May-2021 20:32:39
#2 ]
Cult Member
Joined: 8-May-2003
Posts: 737
From: Norway

@cdimauro

wow, are you back:)

nice to see you bro! hope you will supply site with lots of interesting things:)

_________________
nox's in the house!

 Status: Offline
Profile     Report this post  
matthey 
Re: The (Microprocessors) Code Density Hangout
Posted on 2-May-2021 0:06:06
#3 ]
Elite Member
Joined: 14-Mar-2007
Posts: 2883
From: Kansas

cdimauro Quote:

Since we talk about code density from time to time, I'd like to open a thread in this section to collect all information, instead of having it spread around in several threads (which also disappear and are difficult to retrieve).

@matthey: I think that you're the most competent and active expert about this topic. Would you like to contribute?

...

@matthey can you take care of creating and updating those comments? As I said, you're the most competent and also active, so I think that you can contribute much better than me.


The topic is a nice idea but I get the feeling that most people here really don't want to know about code density so I'm not very motivated. I really should stop wasting my time here but the interesting court case lured me in.

cdimauro Quote:

A request from my side: please can you share your updated 68K source about this http://deater.net/weave/vmwprod/asm/ll/ll.html ? Looking at your tables in Google Documents the numbers don't match with what mr. Weaver published in that link.


The last update I sent Dr. Weaver was mostly an improvement of the decompression code by ross on EAB. The assembler files are available on the last 2 pages of the following thread.

http://eab.abime.net/showthread.php?s=e81df9c472e296778e1c2996bf076333&t=85855&page=8

Notice that ross complains in the last post about the statistics not being updated.

ross Quote:

Just noticed that my 54 byte version is not signaled.
http://www.deater.net/weave/vmwprod/asm/ll/
(best is the 56 bytes 8086 version)

68k deserve the throne

 Status: Offline
Profile     Report this post  
cdimauro 
General information
Posted on 2-May-2021 5:34:39
#4 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Benchmarks
Posted on 2-May-2021 5:35:05
#5 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

Some benchmarks from the The Totally Unscientific Code Density Competition! (compiling the OSD control module from one of the many Minimig variants):

OpenRISC � 81376
MIPS (f32c) � 71356
RISC-V � 69936
ZPU � 68868
ARM � 67952
X86-64 � 66112
m68k (68000) � 65760
i386 � 64080
832 � 63599
RISC-V compressed � 57780
MIPS16 � 54192
ARM Thumb � 51436


Some benchmarks from the Code Density Compared Between Way Too Many Instruction Sets (doing a buildroot run. NOTE: options used for the embedded Linux compilations are NOT clear):
Microblaze LE - 1,223,148 bytes - No target specific settings available
Xtensa - 1,216,228 bytes - fsf target
OpenRISC - 1,133,164 bytes - No target specific settings available
MIPS32 LE - 1,017,196 bytes - P5600 target, no softfloat
SPARC64 - 997,624 bytes - SPARCv9
MIPS64 LE - 989,024 bytes - P6600 target, no softfloat, n32 ABI
PPC64LE - 985,552 bytes - Power8 target
PPC32 - 984,396 bytes - 476FP target
RV64G - 934,368 bytes - RV64G platform defaults
SPARC32 - 931,064 bytes - SPARCv8
S390X - 907,712 bytes - z15 target
Nios II - 892,316 bytes - No target specific settings available
NDS32 - 888,124 bytes - No target specific settings available
SH-4A LE - 842,884 bytes - No target specific settings
ARM64 LE - 779,936 bytes - Cortex-A76 target, FP-ARMv8
x86_64 - 747,224 bytes - Haswell target
RV64GC - 741,856 bytes - ilp64d ABI
RV32GC - 719,916 bytes - ilp32d ABI
x86 - 713,916 bytes - i686 target
m68k - 698,776 bytes - M68040 target
ARM Thumb1 LE - 632,004 bytes - Thumb, softfloat, ARM926T target
ARC LE - 623,912 bytes - 8K pages, HS38 Quad MAC + FPU target
ARM Thumb2 LE - 599,248 bytes - Thumb2, VFPv4-D16, Cortex-A7 target


Some benchmarks from the SPARC16: A new compression approach for the SPARC architecture:
SPARC16-A-new-compression-approach-for-the-SPARC-architecture


Some benchmarks from High-Performance Extendable Instruction Set Computing (benchmark tests unknown):
High-Performance-Extendable-Instruction-Set-Computing


Some benchmarks from Comparative Architectures, CST Part II, 16 lectures, Lent Term 2005 (Ian Pratt) (Old versions of GCC, GCC-C++, PGP):
Comparative-Architectures-CST-Part-II-16-lectures-Lent-Term-2005-Ian-Prattview image download


Some benchmarks from Enhancing the RISC-V Instruction Set Architecture (SPEC2006 and CSiBE test suits used with GCC):
SPEC2006
Enhancing-the-RISC-V-Instruction-Set-Architecture-1-SPEC2006how to modify pictures online for free

SPEC2006 - Compressed vs Base ISA
Enhancing-the-RISC-V-Instruction-Set-Architecture-2-SPEC2006-Compressed-vs-Base-ISA

CSiBE
Enhancing-the-RISC-V-Instruction-Set-Architecture-3-CSi-BE

Average instruction length
Enhancing-the-RISC-V-Instruction-Set-Architecture-4-Average-instruction-length

Normalised instruction counts
Enhancing-the-RISC-V-Instruction-Set-Architecture-5-Normalised-instruction-counts


Some benchmarks from System-on-Chip Design with Arm® Cortex®-M Processors - Reference Book:
Core-Mark-benchmark-ARM-Cortex-M0-vs-PIC-24-PIC18-RL78


Some benchmarks from AnandTech's MIPS Announces I7200 32-bit CPU With New nanoMIPS ISA:



Some benchmarks from RISC-V's compilation of Zephyr using the Compression extension + PULP extension:
Zephyr-RISC-V-vs-ARM


Some benchmarks from Increasing the Code Density of Embedded RISC Applications:



Last edited by cdimauro on 14-Jun-2026 at 07:26 PM.
Last edited by cdimauro on 21-Aug-2025 at 04:59 AM.
Last edited by cdimauro on 22-Jun-2025 at 05:00 AM.
Last edited by cdimauro on 03-Oct-2022 at 04:57 PM.
Last edited by cdimauro on 02-Oct-2022 at 07:43 PM.
Last edited by cdimauro on 02-Oct-2022 at 07:24 PM.
Last edited by cdimauro on 19-Sep-2022 at 08:47 PM.
Last edited by cdimauro on 18-Sep-2022 at 05:01 AM.

 Status: Offline
Profile     Report this post  
cdimauro 
Compilers
Posted on 2-May-2021 5:35:23
#6 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Literature
Posted on 2-May-2021 5:35:37
#7 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

1987
The effect of instruction set complexity on program size and memory performance
One potential disadvantage of a machine with a reduced instruction set is that object programs may be substantially larger than those for a machine with a richer, more complex instruction set. The main reason is that a small instruction set will require more instructions to implement the same function. In addition, the tendency of RISC machines to use fixed length instructions with a few instruction formats also increases object program size. It has been conjectured that the resulting larger programs could adversely affect memory performance and bus traffic.
[...]
With all other factors constant, we found that simple instruction sets can result in programs that require two and a half times more memory than the same programs on a machine with a complex
instruction set. Our evaluation of cache performance showed that for small cache sizes, instruction set complexity severely affected the miss ratio. Fortunately, this aspect of performance can be corrected through the use of large caches. Finally we examined the
amount of bus traffic on the three machines. Even with a large caches (>64k), a machine with a simple instruction set can expect to generate twice as much bus traffic as a machine with a complex instruction set. Overcoming the potential performance bottleneck caused by the increased bus traffic will require innovative high performance memory systems.



1989
The impact of code density on instruction cache performance
The widespread use of reduced-instruction-set computers has generated a lot of interest in the tradeoff between the density of an instruction set and the size of the instruction cache. In this paper we present and justify a method that predicts the cache performance for a wide range of architectures, based on the miss rate for a single architecture. When we apply the method to a number of cache organizations we find that changes in code density can have a dramatic impact on memory traffic, but that modest improvements in code density do not reduce program execution time significantly in a well-balanced system.
[...]
We confirmed earlier observations that relatively moderate changes in code density (e.g. 35%) can sometimes have a dramatic impact on instruction traffic (e.g. factor of 2-3), but we also observed that the impact of code density on the traffic ratio is most significant for larger caches, i.e. when the instruction traffic is low and when instruction cache misses contribute little to the program execution time.



1991
Methods for Saving and Restoring Register Values across Function Calls
This paper describes the results of a set of controlled experiments that were used to evaluate several methods for saving and restoring registers on CISC machines. Our experiments show that a hybrid approach, a combination of callee and caller methods, produces the most effective code
[...]
For these experiments, we measured the number of instructions executed, the number of memory references, and the size of the object code. The experiments were performed on a VAX-11 and a Motorola 68020.



1992
Executing compressed programs on an embedded RISC architecture
The difference in code size between RISC and CISC processors appears to be a significant factor limiting the use of RISC architectures in embedded systems. Fortunately, RISC programs can be effectively compressed. An ideal solution is to design a RISC system that can directly execute compressed programs. A new RISC system architecture called a Compressed Code RISC Processor is presented. This processor depends on a code-expanding instruction cache to
manage compressed programs. The compression is transparent to the processor since all instructions are executed from cache.
Experimental simulations show that a significant degree of compression can be achieved from a fixed encoding scheme.
The impact on system performance is slight and for some memory implementations the reduced memory bandwidth actually increases performance.



1997
Improving Code Density Using Compression Techniques
We propose a method for compressing programs in embedded processors where instruction memory size dominates cost. A post-compilation analyzer examines a program and replaces common sequences of instructions with a single instruction codeword. A microprocessor executes the compressed instruction sequences by fetching codewords from the instruction memory, expanding them back to the original sequence of instructions in the decode stage, and issuing them to the execution stages. We apply our technique to the PowerPC instruction set and achieve 30% to 50% reduction in size for SPEC CINT95
[...]
Our compression ratio is similar to that achieved by Thumb and MIPS16. While Thumb and MIPS16 designed a completely new instruction set, compiler, and instruction decoder, we achieved our results only by processing compiled object code and slightly modifying the instruction fetch mechanism.
There are several ways that our compression method can be improved. First, the compiler could attempt to produce instructions with similar byte sequences so they could be more easily compressed. One way to accomplish this is by allocating registers so that common sequences of instructions use the same registers. Another way is to generate more generalized STDS code sequences. These would be less efficient, but would be semantically correct in a larger variety of circumstances. For example, in most optimizing compilers, the function prologue sequence might save only those registers which are modified within the body of the function. If the prologue sequence were standardized to always save all registers, then all instructions of the sequence could be compressed to a single codeword. This space saving optimization would decrease code size at the expense of execution time. Table 3 shows that the prologue and epilogue combined typically account for 12% of the program size, so this type of compression would provide significant size reduction.



2001
High-performance extendable instruction set computing
In this paper, a new architecture called the extendable instruction set computer (EISC) is introduced that addresses the issues of memory size and performance in embedded microprocessor systems. The architecture exhibits an efficient fixed length 16-bit instruction set with short length offset and immediate operands. The offset and immediate operands can be extended to 32 bits via the operation of an extension flag.The code density of the EISC instruction set and its memory transfer performance is shown to be significantly higher than current architectures making it a suitable candidate for the next generation of embedded computer systems.The compact EISC instruction set introduces data dependencies that seemingly limit deep pipeline and superscalar implementations. This paper suggests a mechanism by which these dependencies might be removed in hardware.
[...]
the EISC architecture has achieved in the order of 140 to 220 5% better code density than existing FUSCprocessors and a figure of about 120 to 140 % compared to CISC machines. Even compared to 16-bit compressed instruction RISC machines such as the ARM-7TDM1, the program size of the EISC has been found to be 5 to 15% smaller and the fiequency of Load and Store instructions about 15% lower.



2003
Reducing code size with echo instructions
In this paper, we examine an executable form of program compression using echo instructions.
With echo instructions, two or more similar, but not necessarily identical, sections of code can be reduced to a single copy of the repeating code. The single copy is left in the location of one of the original sections of the code. All the other sections are replaced with a single echo instruction that tells the processor to execute a subset of the instructions from the single copy.
[...]
Given a highly optimized binary, our results show that traditional software based procedural abstraction achieves a 94.3% compression ratio, while the use of echo instructions achieves a 84.5% compression ratio.
In addition, we evaluate the use of echo instructions with CodePack. CodePack achieved a 70.0% compression ratio on our optimized binaries, and CodePack with echo instructions resulted in a 63.2% compression ratio.



2005
Enhanced code density of embedded CISC processors with echo technology
• An updated study of the code density of IA32 v.s. ARM/THUMB with EEMBC and Spec2kINT benchmarks.
On the average, IA32 code optimized for size is about 16% to 25% smaller than size-optimized ARM code and about 18% to 23% larger than THUMB.
• A demonstration that ET reduces IA32 code size by 17% to 20%. This brings IA32 code to similar code density as THUMB code. Since THUMB often suffers serious performance loss compared to ARM code and a study has shown that ET incurs much smaller performance loss, IA32 with ET presents a significant performance advantage over THUMB.


Improving Program Efficiency by Packing Instructions into Registers
This paper presents a novel architectural and compiler approach to simultaneously reduce power requirements, decrease code size, and improve performance by integrating an instruction register file (IRF) into the architecture. Frequently occurring instructions are placed in the IRF. Multiple entries in the IRF can be referenced by a single packed instruction in ROM or L1 instruction cache.
[...]
We find that a 32 entry IRF provides an average 19% reduction in code size for the embedded applications studied, and much greater compression for some of the larger applications.
[...]
A 32 entry IRF reduces energy consumption of I-Fetch by an average of 37%, which translates to an overall processor energy savings of 15% — and as much as a 45% for blowfish, which ran much faster due to a dramatic reduction in IC misses. Energy is saved because over 50% of all instructions were fetched from a very small, lowpower register file instead of a larger IC. The utilization of the IC is also improved, lowering miss rate. We expect IRFs to be performance neutral, but consistently find a small savings (and in some applications, a large savings) due to a increased IC locality.



2009
Code Density Concerns for New Architectures
Reducing a program’s instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures.
We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone.
We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide
units, number of instruction operands, and the availability of unaligned loads and stores.
We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system
call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture.


SPARC16: A new compression approach for the SPARC architecture
This article proposes to apply a new encoding to the SPARCv8 architecture. Through extensive analysis of a program mix from the Mibench and Mediabench benchmark suites, we suggest a new 16-bit instruction set, easily translated to its 32-bit counterpart during execution time. Using the aforementioned program mix to infer how code could be represented in the proposed 16-bit ISA, compression ratios as low as 56% can be obtained. We also evaluated the cache behavior and showed reductions of 42% on cache misses that can increase performance up to 28% (for patricia program with 2KB cache)


2016
Design and evaluation of compact ISA extensions
This paper proposes a 16-bit extension to the SPARC processor, the SPARC16. Additionally, we provide the first methodology for generating 16-bit ISAs and evaluate compression among different 16-bit extensions. SPARC16 programs can achieve better compression ratios than other extensions, attaining results as low as 67%. Moreover, SPARC16 reduces cache miss rates up to 9%, requiring smaller caches than SPARC processors to achieve the same performance; a
cache size reduction that can reach a factor of 16.
[...]
we achieve SPARC16 compression ratios similar to the ones obtained by production quality MIPS and ARM 16-bit extensions, achieving better compression ratios than both in several programs.
Furthermore, we evaluated SPARC16 performance by analyzing effects of code size reduction in the instruction cache. Although the execution of SPARC16 programs yields more instructions than SPARCv8, lower cache miss ratios are achieved by SPARC16 compiled programs; the best ratio achieves 9% reduction in dijkstra from MiBench.



2017
Exploring the Limits of Code Density
I hand-assemble a simple benchmark on a number of architectures with the end goal being the smallest code size. A comparison can then be made of the code density of the architecture. The benchmark is small and simple, so may not give a full accounting of code density for larger
benchmarks, but picking a larger benchmark would make the hand-coded assembly task much larger. The benchmark does have some useful routines in it, such as LZSS compression [3, 1], string concatenation, and integer to string conversions.


Reducing calling convention overhead in object-oriented programming on embedded ARM thumb-2 platforms
This paper examines the causes and extent of code size overhead caused by the ARM calling convention in Thumb-2 binaries.
[...]
We show that 4-12% of the instructions in typical ARM Thumb-2 binaries are overhead derived from the calling convention, with the average rate for C++ programs being about 3% higher than for C programs. The overhead percentage is shown to be positively correlated with the average number of call sites per procedure. We demonstrate our optimizer which eliminated 22-31% of the calling convention overhead in C++ programs in our trials.



2020
HW/SW approaches for RISC-V code size reduction
we propose a new RISC-V ISA extension that explicitly targets improved code density, with a special focus on the push/pop instructions, needed to handle multiple stack memory operations. The extension effectively reduces the code density gap, both on the Embench open benchmark (from 11.8% to 7.7%) and also on 3GPP industry-strength code aimed at enabling low power, low data rate machine-to-machine communication (from 11.5% to 5.8%). Finally, we provide an implementation of the ISA extension on the open CV32E40P core to evaluate the impact on the core area and operating frequency. Results show a minimal increase of 2.5% of the core area and no impact on the maximum frequency
[...]
Results show that the new instructions do not impact the maximum frequency, and add only 2.5% of core area (~335 nand2-equivalent gates), providing improvements in stack handling operations in performance, power, and code-size.

It's worth nothing here that implementing such very complex instructions has taken just little more silicon, while providing great general benefits.

Last edited by cdimauro on 14-Jun-2026 at 06:17 PM.
Last edited by cdimauro on 14-Jun-2026 at 07:57 AM.
Last edited by cdimauro on 14-May-2026 at 06:09 AM.
Last edited by cdimauro on 09-May-2026 at 05:26 AM.
Last edited by cdimauro on 03-May-2026 at 06:29 AM.
Last edited by cdimauro on 27-Apr-2026 at 08:35 PM.

 Status: Offline
Profile     Report this post  
cdimauro 
Microprocessors
Posted on 2-May-2021 5:35:52
#8 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Motorola 68K
Posted on 2-May-2021 5:36:22
#9 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

Some instructions statistics from Kickstart 1.2:

1 word opcodes: 1965 (51.0)
2 word opcodes: 1436 (37.2)
3 word opcodes: 267 (6.9)
4 word opcodes: 92 (2.4)
5 word opcodes: 34 (0.9)
6 word opcodes: 7 (0.2)
7 word opcodes: 17 (0.4)
8 word opcodes: 18 (0.5)
9 word opcodes: 3 (0.1)


And from some applications (only showing instructions with 1% or more of total):
Program 'dis' (my disassember, asm)
total insts = 23861
1 word insts = 11050, 46%
2 word insts = 10940, 45%
3 word insts = 1834, 7%

Program 'CMP' (asm)
total insts = 1984
1 word insts = 752, 37%
2 word insts = 943, 47%
3 word insts = 217, 10%
4 word insts = 72, 3%

Program 'MBRTest-2' (asm?)
total insts = 25401
1 word insts = 16664, 65%
2 word insts = 7619, 29%
3 word insts = 965, 3%

Program 'Hex'
total insts = 8730
1 word insts = 2935, 33%
2 word insts = 5120, 58%
3 word insts = 646, 7%

Program 'Redit'
total insts = 23984
1 word insts = 8890, 37%
2 word insts = 12025, 50%
3 word insts = 2880, 12%

Program 'cb' (Sierra game "Colonel's bequest")
total insts = 1093
1 word insts = 454, 41%
2 word insts = 561, 51%
3 word insts = 65, 5%
4 word insts = 13, 1%

Program 'AIBB'
total insts = 49748
1 word insts = 22517, 45%
2 word insts = 20943, 42%
3 word insts = 5574, 11%

Program 'moned'
total insts = 5136
1 word insts = 2894, 56%
2 word insts = 2023, 39%
3 word insts = 208, 4%

Program 'AmiModRadio'
total insts = 43042
1 word insts = 24239, 56%
2 word insts = 11125, 25%
3 word insts = 7587, 17%

Program 'wolf3d'
total insts = 40359
1 word insts = 20901, 51%
2 word insts = 11938, 29%
3 word insts = 5932, 14%
4 word insts = 1398, 3%

Program 'IBrowse' (sasc)
total insts = 222475
1 word insts = 115443, 51%
2 word insts = 88699, 39%
3 word insts = 15854, 7%
4 word insts = 2397, 1%

Program 'python2.4' (gcc?)
total insts = 175669
1 word insts = 98758, 56%
2 word insts = 55819, 31%
3 word insts = 19977, 11%

Last edited by cdimauro on 10-Sep-2022 at 07:22 AM.

 Status: Offline
Profile     Report this post  
cdimauro 
Intel IA-32/x86 / AMD x86-64/x64
Posted on 2-May-2021 5:36:36
#10 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
ARM
Posted on 2-May-2021 5:36:51
#11 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
RISC-V
Posted on 2-May-2021 5:37:06
#12 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
SuperH / J2
Posted on 2-May-2021 5:37:18
#13 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Embedded
Posted on 2-May-2021 5:37:31
#14 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

CPU Design at SEGGER
S32E — a 32-bit CPU with super high code density
Code density comparison based on the Sieve of Eratosthenes:

// expect: success
// title: sieve benchmark, a larger test of the code generator

#define SIZE 8190

char flags[SIZE+1];

int main(void) {
int i, prime, k, count, iter;
//
for (iter = 1; iter LE 10; iter ++) {
count = 0;
for (i = 0; i LE SIZE; i++)
flags[i] = 1;
for (i = 0; i LE SIZE; i++) {
if (flags[i]) {
prime = i + i + 3;
k = i + prime;
while (k LE SIZE) {
flags[k] = 0;
k += prime;
}
count = count + 1;
}
}
}
//
return count == 1899 ? 0 : count;
}

Note: less or equal operator replaced by LE due to bugs in the forum's software.

Results
RISC-V: 96 bytes (RV32IMAC instruction set)
ARM: 100 bytes (Thumb-2, ARMv7M instruction set)
S32E: 70 bytes

They claim that they could furtherly go down by 12 bytes (58 bytes total!) by introducing a few single-byte instructions.

Results are really impressive purely looking at the code density, but the price to pay is a very high number of executed instructions (34).

By reference, the results of my new architecture (which is using a novel concept to "pack" several operations on a single instruction, which helps both code density and number of executed instructions):
Size: 68 bytes. Instructions: 18 # Regular architecture.
Size: 66 bytes. Instructions: 17. # Architecture using block-based vectorization (REP) instructions.

Last edited by cdimauro on 05-Apr-2026 at 05:25 AM.

 Status: Offline
Profile     Report this post  
cdimauro 
NEx64T
Posted on 2-May-2021 5:38:09
#15 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

Code density statistics for NEx64T (click on the images for better quality):

08-Code-density-improvements

09-Instructions-lengths-distribution-32-bit-mode

10-Instructions-lengths-distribution-64-bit-mode


Instructions execution statistics for NEx64T (click on the images for better quality):
11-Fewer-instructions-executed


32-bit Windows applications used:
arm-none-eabi-gdb.exe
cc1.exe
cc1plus.exe
CrashReportClient_32.exe
Discord.exe
ffmpeg_32.exe
FoxitProxyServer_Socket_RD.exe
GalaxyCommunication.exe
GameClient.exe
Kodi.exe
LauncherPrereqSetup_x64.exe
lto1.exe
MassEffect3.exe
Microsoft.VSCode.CPP.Extension.exe
MomodoraRUtM.exe
Neverwinter.exe
OriginER.exe
OriginThinSetupInternal.exe
ovftool.exe
PlanMaker.exe
Presentations.exe
PS32.exe
Skype.exe
sublime_text_32.exe
TextMaker.exe
The Bard's Tale.exe
TOTALCMD.EXE
UE4Game_32.exe
vmware-remotemks.exe
vs_profiler_x86_enu.exe
WinUAE.exe


64-bit Windows applications used:
aswidsagenta.exe
atio6axx.dll
Code.exe
CrashReportClient_64.exe
EpicGamesLauncher.exe
EXCEL.EXE
fdm.exe
ffmpeg_64.exe
GRAPH.EXE
mame64.exe
mkvinfo.exe
MSACCESS.EXE
MSPUB.EXE
node.exe
OfficeC2RClient.exe
PS64.exe
steamwebhelper.exe
sublime_text_64.exe
TOTALCMD64.EXE
UE4Game_64.exe
VirtualBox.exe
vmware-vmx-debug.exe
vmware-vmx.exe
vsinstr.exe
WinUAE64.exe


Decoder information for NEx64T (click on the images for better quality):
05-Instructions-decoding-only-first-bits-LSBs-matter

06-Simplified-frontend-decoders-keep-only-the-good-parts-of-Intel64-x64

07-New-opportunities-with-simplified-frontend

Last edited by cdimauro on 10-Sep-2022 at 07:53 AM.

 Status: Offline
Profile     Report this post  
cdimauro 
Future Topic
Posted on 2-May-2021 5:38:23
#16 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Future Topic
Posted on 2-May-2021 5:38:34
#17 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Future Topic
Posted on 2-May-2021 5:38:45
#18 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Future Topic
Posted on 2-May-2021 5:38:57
#19 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  
cdimauro 
Future Topic
Posted on 2-May-2021 5:39:11
#20 ]
Elite Member
Joined: 29-Oct-2012
Posts: 4625
From: Germany

TBD

 Status: Offline
Profile     Report this post  

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle