Click Here
home features news forums classifieds faqs links search
6066 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
39 crawler(s) on-line.
 33 guest(s) on-line.
 1 member(s) on-line.


 bhabbott

You are an anonymous user.
Register Now!
 bhabbott:  3 mins ago
 SHADES:  14 mins ago
 zipper:  16 mins ago
 AMIGASYSTEM:  22 mins ago
 sibbi:  25 mins ago
 BigD:  31 mins ago
 eliyahu:  34 mins ago
 utri007:  38 mins ago
 AF-Domains.net:  38 mins ago
 kas1e:  41 mins ago

/  Forum Index
   /  General Technology (No Console Threads)
      /  The (Microprocessors) Code Density Hangout
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 Next Page )
PosterThread
Hammer 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 7:30:12
#181 ]
Elite Member
Joined: 9-Mar-2003
Posts: 4595
From: Australia

@Gunnar

Quote:

Gunnar wrote:
@Hammer

Quote:

1. PowerPC architectural differences, including a much higher penalty for float to int conversion on the PPC.


Actually the Float to Int conversion has no penalty on PPC.
What used to have a penalty was moving the int to an integer register.

68K and others have FPU instruction where they can read from values into FPU from Integer registers and have also instructions to write result back from FPU into integer registers.

IBM did regard this as unneeded and the FPU could write results only to FPU register.
The FPU register could then be stored to memory location.
There where no instruction available on PPC to directly move an FPU result to an Integer Register.
This means the FPU needed to move the value to memory and then the Integer unit needed to load it from memory.

For most FPU core this is not a problem at all and normal academic / industrial algorithms run perfectly without needing this. Sometimes like in Quake game this move is wanted - you suffer a delay from the memory access.
Recent POWER CPUs did solve this.

Read the whole post for context.

From NVIDIA's presentation for RTX Turing on integer vs floating point code usage statistics


Modern 3D games are still a mix of integers and floating points usage.

Last edited by Hammer on 28-Sep-2022 at 07:33 AM.

_________________
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 3080 Ti
Ryzen 9 7900X, DDR5-5600 32 GB RAM, GeForce RTX 3080 Ti
Amiga 1200 (rev 1D1, KS 3.2, TF1260, 68060 @ 63 Mhz, 128 MB)
Amiga 500 (rev 6A, KS 3.2, PiStorm/RPi3a/Emu68)

 Status: Offline
Profile     Report this post  
Gunnar 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 11:55:02
#182 ]
Regular Member
Joined: 25-Sep-2022
Posts: 152
From: Unknown

Quote:

Gunnar wrote:
@cdimauro

We all know that you are self proclaimed Amiga Hardware and CPU expert.

Am I not? [/quote]

No you are not!

You like us to think that you would be one,
but you are absolutely not. You have literally no clue about what you talk.

If are an Amiga and CPU experts, where are the many super demos and games you wrote in the last 5 years? How many Amiga games did you write?
None?

And why did you not write any?
Because you can't!

All you can is posting clueless nonsense in forums.
You are only a pretender, you are not someone with real knowledge.

I know a school kid that wrote tons of Demos and Games in the last years.
This Kid has much more knowledge about Amiga than you.
And does the kids boast here in this forum, claiming to be great Amiga expert?
NO, it does not.
The kid rather spend his time playing football or coding a new Amiga demo.


Why do you post hundreds of empty, contentless posts here?
Because this is all you can.


For years you posted your TINA nonsense 400Mhz/800Mips
Reading the FPGA Specsheet and reading there its maximum clockrate
would have taken literally only minutes.
Why did you and your friend not do this in these years?

There are two possible reasons:
a) you did not understand the fact sheet
b) you did not care to get informed about what you talk.
You are perfectly happy to talk bullshit.



 Status: Offline
Profile     Report this post  
Karlos 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 18:08:21
#183 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3118
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Erm...

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 19:36:17
#184 ]
Super Member
Joined: 14-Mar-2007
Posts: 1684
From: Kansas

@Gunnar
Other than the 3 core developers, Jens, Chris and Gunnar, who were the first 3 Apollo Core team members?

I just want to verify you are really Gunnar von Boehn.

Gunnar Quote:

But was having no clue ever be a problem for an armchair expert?


In your e-mail to me on March 26, 2012 you wrote, "I value your 68K knowledge and opinions." On March 27, 2012 you wrote, "Welcome to the APOLLO-team then. I believe we very much need some good coders as consultants in the team to make the right decisions."

Gunnar Quote:

I think it would have been better for MOTO if they would have used the
size="11" encoding of the immediate instructions for
OPP.L #16bit,(ea)

This encoding was a hole and the 020 used it to add all those CAS/CAS2
CMP2/CHK2 instructions.
The 060 dropped those instructions anyhow. I think no one needs them.
This encoding would have been ideal to solve this perfectly clean.
(See attached file: Decoder.ods)

What do you think?


I think it would be better if a 64 bit mode allowed size="11" for a 64 bit size. This encoding would have been ideal to solve this perfectly clean. This is much easier to decode than a prefix and there is no prefix growth for 64 bit sizes. I already gave you a more compatible 16 bit immediate compression without using the size="11" with an addressing mode encoding. What do you think?

Mit freundlichen Grüßen / Kind regards

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 19:42:29
#185 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Gunnar

Quote:

Gunnar wrote:
Quote:

Gunnar wrote:
@cdimauro

- We all know that you are self proclaimed Amiga Hardware and CPU expert.

Am I not?


No you are not!

You like us to think that you would be one,
but you are absolutely not. You have literally no clue about what you talk.

Care to prove it, Mr. I-never-prove-my-statements?

Quote whatever you want from what I've talked about the Amiga and PROVE (not by empty words with ZERO facts, like you usually do) it. Otherwise I've to call you LIAR!
Quote:
If are an Amiga and CPU experts, where are the many super demos and games you wrote in the last 5 years? How many Amiga games did you write?
None?

Again with a logical fallacy! It's unbelievable how an hardware engineer, which is supposed to have a strong logic, could continuously fall on logic fallacies outside of his domain...

Anyway, here you're served:
Fightin' Spirit

USA Racing (I'm the forth on the photo, from left to right).
USA Racing
USA Racing

Satisfied? Now go home and cry, baby!
Quote:
And why did you not write any?
Because you can't!

All you can is posting clueless nonsense in forums.
You are only a pretender, you are not someone with real knowledge.

See above and shut-up, lamer and ignorant!
Quote:
I know a school kid that wrote tons of Demos and Games in the last years.
This Kid has much more knowledge about Amiga than you.
And does the kids boast here in this forum, claiming to be great Amiga expert?
NO, it does not.
The kid rather spend his time playing football or coding a new Amiga demo.

Well, I had the time to do a lot of sport AND also squeeze the most from my Amigas for the above two games that I worked on.

And now, borrowing the same logical fallacy, care to show which "super demos" (Sic.!) or games (not stupid ones: Fightin' Spirit was a AAA one of the time and it was the best game of the year on '96. USA Racing would have been the same: see its technical specs on the third link about it) have you worked on?
Quote:
Why do you post hundreds of empty, contentless posts here?
Because this is all you can.

Hey, don't exchange your cloths with mine: actually YOU are the one which fires complete bullsh*t and NEVER prove your sentences even after several requests.
Quote:
For years you posted your TINA nonsense 400Mhz/800Mips
Reading the FPGA Specsheet and reading there its maximum clockrate
would have taken literally only minutes.
Why did you and your friend not do this in these years?

There are two possible reasons:
a) you did not understand the fact sheet
b) you did not care to get informed about what you talk.
You are perfectly happy to talk bullshit.

Hey, parrot: I've already replied to that on my previous post. Are you blind? Or it's "only" because the Nature was a very bad step-mother with you?


@Karlos

Quote:

Karlos wrote:
Erm...



Naah. Why this should end? Slapping Gunnar outside of his kingdoms is priceless. Let people continue enjoy the show.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 20:04:27
#186 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@matthey

Quote:

matthey wrote:

Desktop wannabe RISC ISAs
Alpha, PA-RISC (worst code density RISC first to die)
|
V
MIPS, SPARC, PPC (normal fat RISC died slowly)
|
V
AArch64, RISC-V C (new less RISCy improved code density RISC)

Embedded RISC ISAs
SuperH (worst code density, RISC instruction overload, first to die)
|
V
Thumb (normal embedded RISC died next, less RISC instruction overload)
|
V
Thumb2 (new improved code density RISC, tolerable RISC instruction fluff)

Some people say code density doesn't matter.

They are people which clearly have no clue of what they talk about.
Quote:
There would be a lot of coincidences that the fattest RISC ISAs died first, replaced by the less fat ones and then they were replaced by the least fat ones today. Wouldn't it just be easier to use CISC ISAs which can beat them all in code density? Wouldn't it be easier just to use the best code density CISC ISAs to avoid being replaced like happened with RISC ISAs?

It makes sense generally talking, but it also means that there's no space for innovation and to remove bad design decisions.
Quote:
Where could we find one of the best code density ISAs, with good performance traits and without as much decoding overhead as fat CISC with a bunch of prefixes?

Not on the RISC land, of course.

Anyway, I'm biased here, as you know.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 20:26:44
#187 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@bhabbott

Quote:

bhabbott wrote:
@matthey

Quote:

matthey wrote:

Sigh. Code density is more about instruction cache efficiency.

And smaller code uses less cache. So you agree with me.

No. You stated something different before which is against what Matt said:

Vampires have at least 120MB of RAM, so a few kB here and there is nothing.


To justify increases on the code size due to the prefixes usage.
Quote:
Quote:
Gunnar's logic is likely similar to yours. The 64 bit features will be rarely used so it is fine if they are high overhead using a prefix as long as they have a low resource cost now. Adding register banks and having a large register file is cheap in FPGA with a low CPU clock rate and may give some extra performance so plan for today. There will never be an ASIC either so optimize the ISA and design for a FPGA that will save resources and give benefits today. It is poor planning with a self fulfilling prophesy that the future will never come.

A bird in the hand is worth two in the bush.

I don't know about Gunnar's plans, but I doubt there will be an ASIC in the 68080's near future whatever its architecture. The huge advantage of FPGA is that it can be one thing today and something quite different tomorrow, without having to change any hardware. This protects and extends the user's investment. My Vampire is significantly better now than when I bought it a few years ago, simply by uploading a new bitstream. If it was an ASIC I would be stuck with outdated and possibly buggy hardware.

Exactly. An ASIC could be realized only when the chipset (at least) is stable. Which is absolutely not the case currently.
Quote:
Like the saying goes "The perfect is the enemy of the good". As we get older we tend to want more and more perfection, with the result that nothing actually gets finished - until one day it's too late. This is the problem Commodore had with AAA - the engineers had grand plans for a chipset with 'bleeding edge' features and performance, but development dragged on and then they had to rush out the barely adequate AGA chipset before it was too late. If only they had settled for a less 'perfect' design in the first place they might have gotten AGA out 2 years earlier and the Amiga could have stayed relevant for a while longer (during which time they could have been developing the next generation chipset to replace it).

Which wasn't the case. Actually the problem wasn't about seeking for the perfection, rather that Commodore MISSED experienced engineers which did chipsets design, after that Jay Miner and some other guy went out from the company.

That's why it took so long to have just the ECS after so many years from the first chipset: the engineers which remained needed time to get used to this new job.

In the meanwhile the competitors quickly filled the gap and went on advantage...
Quote:
Quote:
A 2 byte prefix can hold twice as much data so 64 bit extensions and extra register accesses could be placed in one prefix but then that wouldn't be common "if only a few instructions are prefixed" and extra registers shouldn't be needed as often as the 68k normally has 16 GP register while x86 only has 8 without a prefix.

x86 used prefixes in an attempt to stay close to the 8080 ISA (to ease the porting of CP/M assembly code, which was considered important at the time). Prefixes were applied to many instructions to make up for the lack of GP registers.

That's not true. 8086's prefixes were, and are, used to:
- override the default segment for accessing the source or destination memory location (only the source for memory-to-memory instructions);
- signal the bus lock for some instructions which do read-modify-write operations on memory;
- repeat string operations.

This unless you were talking about the REX prefix usage on x86-64 to access the additional 8 registers.
Quote:
Opcodes referencing 8 bit registers were extended to 2 bytes as it was considered that 16 bit registers should get the shorter opcodes.

Again, this isn't the case: instructions using 8 or 16 bit data have the same encoding and lengths on 8086 and 8088.
Quote:
This made the 8088 slower than a Z80 running equivalent code at the same clock speed.

Any benchmark for this?

P.S. Already replied on the rest.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 21:04:32
#188 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:

As usual, you don't read what people write. In fact, I've already created the architecture that kills not only x64, but all others (that I know). AND I've posted some preliminary (because results could be much better having a backend for it, which could allow to fully exploits its features) in this thread:
https://amigaworld.net/modules/newbb/viewtopic.php?mode=viewtopic&topic_id=44169&forum=17&start=0&viewmode=flat&order=0#841312
Nice, eh?

BTW I haven't yet shown the slides with the comparisons between its vector unit and Intel's AVX-512, RISC-V's vector extension, and ARM's SVE (vector extension as well), because its more embarrassing for them.

To quote a typical Sony fanboy, it has no games.

Creating a new CPU instruction set with no games.... that's a failure for most of Amiga's audience.

It depends on how you create the ISA. For mine I was inspired by the work that Stephen Morse did with Intel's 8086.

The 8086 was successful because it was almost 100% source-level compatible with 8085, so porting the existing applications from the latter was very simple and effective.

I know perfectly that the software library is very very important and that's why my primary goal with NEx64T was to have 100% assembly-level compatibility with IA-32 and x86-64.

This means that usually a recompilation is enough to get a binary for my architecture, with the exceptions of applications that make assumptions about the instruction's opcode structure (assemblers, compilers, debugger, JIT compilers).

However and since any IA-32/x86-64 instruction could be usually mapped to a corresponding one on NEx64T, adapting the more difficult applications is quite simple.
Quote:
Have you departed from Amiga's primary target audience?

No, because I had/have a different goal & strategy (see above).
Quote:
Vampire's success is mostly due to carrying WHDLoad 68k Amiga game library legacy into the next performance level hardware by not abandoning legacy like the PowerPC camp i.e. Apple's 68K-to-PPC migration model wouldn't work for Amiga's game console like nature.

PiStorm/Emu68 also follows respect WHDLoad 68k Amiga game library legacy.

Your fictional X64 CPU replacement does nothing for Geekbench, Cinebench R20/R23, Blender 3D, X64 PC games and 'etc'.

See above. And BTW for the same reasons which I've explained it's also much easier to have a performing IA-32/x86-64 emulator for applications that have only binaries.
Quote:
AMD is pretty good at packing transistors into a given area.

[image removed]

AMD's 40 nm Bobcat with 400 million transistors and 74 mm2 die size vs Intel's 45 nm Pineview with 178 million transistors and 87 mm2 die size.

Bobcat used a better production process and it still sucked at lot at performances, compared to the Atom.
Quote:
The instruction set alone doesn't complete the solution for a chip product when layout skills are also important.

The uncore is a different thing and it's orthogonal to the processor's cores.
Quote:

Hammer wrote:
Embedded processor mindset

It isn't mine. As clearly stated on the slides that you shared, my architecture scales from embedded to HPC.

In fact, it can go from:
16 x GP registers + 0 x FPU registers + 0 x SIMD registers + 0 x Mask registers

up to:
32 x GP registers + 8 x FPU registers + 32 x SIMD registers + 16 x Mask registers

depending on the specific needs.
Quote:
doesn't nothing for close source legacy software.

See above: when sources are missing an emulator or JIT compiler could be created and getting good performances.
Quote:
I have read Cesare Di Mauro's http://docplayer.net/50921629-X86-x64-assembly-python-new-cpu-architecture-to-rule-the-world.html

Thanks for sharing it. However the talk was about the second version of my architecture, which is very very different from the current one (the 10th).

Some remarkable differences: SIMD registers are only 32 (maximum) and decoding the instructions is a bit more complicated (I've introduce more opcode formats to improved the code density and also for the general memory-to-memory instructions).

But it paid of. From the slides (v2 version):

Adobe Photoshop CS6 public beta - 32-bit

Total Instructions: 1746569
Class Count % Avg sz NEx64T Diff
INTEGER 1631136 93.39 3.2 3.4 0.2 +5.6%
FPU 114521 6.56 3.2 3.6 0.4 +13.9%
SSE 912 0.05 4.0 4.7 0.6 +16.1%
Size: 5634556 NEx64T Size: 5982402 Diff: 347846

Global result: +6.2%


Adobe Photoshop CS6 public beta - 64-bit

Total Instructions: 1737331
Class Count % Avg sz NEx64T Diff
INTEGER 1638505 94.31 4.3 3.5 -0.8 -17.8%
SSE 93942 5.41 5.2 4.5 -0.7 -12.9%
FPU 4884 0.28 3.1 3.2 0.0 +1.1%
Size: 7556180 NEx64T Size: 6239790 Diff: -1316390

Global result: -17.4%



Now (v10 version):
Adobe Photoshop CS6 public beta - 32-bit

Total Instructions: 1746569
Class Count % Avg sz NEx64T Diff
INTEGER 1631136 93.39 3.2 2.9 -0.3 -8.7%
FPU 114521 6.56 3.2 3.7 0.6 +17.5%
SSE 912 0.05 4.0 4.5 0.5 +11.5%
AVXFAKE 912 0.05 5.0 4.5 -0.5 -9.7%
AVX512F 912 0.05 6.4 4.5 -1.9 -30.2%
Size: 5634556 NEx64T Size: 5240124 Diff: -394432

Global result: -7.0%


Adobe Photoshop CS6 public beta - 64-bit

Total Instructions: 1737331
Class Count % Avg sz NEx64T Diff
INTEGER 1638505 94.31 4.3 3.2 -1.1 -25.9%
SSE 93942 5.41 5.2 4.7 -0.5 -9.6%
AVXFAKE 93942 5.41 5.3 4.7 -0.7 -13.0%
AVX512F 93942 5.41 6.8 4.7 -2.2 -31.6%
FPU 4884 0.28 3.1 3.5 0.4 +11.8%
Size: 7556180 NEx64T Size: 5686156 Diff: -1870024

Global result: -24.7%


Note: AVXFAKE and AVX512F are the same SSE instructions but encoded for AVX2 and AVX-512, respectively. Just to have a comparison this those two SIMD extensions.

This also proves that designing architectures isn't a trivial task, if you want to achieve certain results.
Quote:
Cesare Di Mauro's re-compilation on closed-source X86/X86-64 PC legacy software will break DRM/anti-cheat certificate checks. Valve has invested in R&D with DRM providers for SteamOS's Proton/DXVK layer.

I don't see any blocker: DRMs could be adapted / updated, or... deceived.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 21:22:01
#189 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Karlos

Quote:

Karlos wrote:
@Karlos

Quote:

Karlos wrote:
@matthey

Specifically, I'm only interested in conditional branches. These occupy about 25% of the available primary opcode space in MC64K. Those slots can be replaced with fast path register to register instructions that are 33% smaller than the current realisation and are even simpler to decode. Totally worth it, I think.


Somewhat returning to the topic of code density. I made the changes indicated above in a branch. To recap:
- Normal instructions are of the form [ opcode ] [ ea dst ] [ ... ] [ ea src ] [...]
- Where the operand EA is a register direct type, this was a single byte, but the EA function at runtime still has to be called.
- A fast path rearranged these into [ fast prefix ] [ opcode ] [ reg pair ]

This meant that a typical "fast path" operation like add.l d0, d1 still took 3 bytes to encode, even if it did skip the whole EA decode logic at runtime.

About 25% of the Opcode values were taken up with a fairly rich set of compare and branch instructions (as there's no CC).
- These were changed into just 2 (for now) that use a sub-opcode as the condition to check.
- This freed up a large number of opcodes to use as a reworked fast path of the form [ opcode ] [ reg pair ] which is much more in line with my other load/store VM designs.

Well done. That's something which was also missed on VAX and that could have improved a lot the code density and execution.
Quote:
To test this, I have a simple absolutely naive mandelbrot generation program, that plots the set at 2048 x 2048 with a max iteration depth of 128.

- The original version (using the 3-byte fast path encoding where possible) produces a bytecode chunk in the binary of 483 bytes.

- The code-density version produces a bytecode chunk of 428 bytes, a reduction of about 11.4%

However, what matters is execution. Each version was ran a fixed number of times each and the best values reported by the time instruction used (the VM was compiled without internal instrumentation option which would skew the results).

- Original: 5.181 seconds (user time)

- Code Density: 3.873 seconds (user time)

That's an increase of 33.7%, almost exactly the same as the effective reduction in opcode size going from 3 bytes to 2. However a factor in that is also that the 2 byte instructions are also simpler to decode as they have no prefix.

I think that the only factor which is affecting the performance improvement is due to the much easier decode path.

The code density here shouldn't matter, because the instructions of your ISA are just bytes located on the data cache of the real processor. And BTW the difference in size is negligible compared to the data cache size available on your processor.
Quote:
Peak interpretive performance for the new fast path instruction format is about 620 MIPs (add.q r1, r0) on my machine. It was about 500 previously. I probably need to improve the benchmarking methodology, e.g. locking to a single core and preventing speed stepping.

Indeed.
Quote:
Further improvements are possible because there are no equivalent "fast path" variants for any compare and branch (except for dbnz). The mandelbrot code bailout check has an fbgt.s instruction (which is called up to 128 times per pixel) that would benefit from:
- A fast path for register to register (currently is EA decode)
- A short form given it currently only supports 32-bit branch displacements.

Consider also to add a fast for ternary instructions (x = y op z): it will improve a lot both the code density and execution time (less instructions executed).
@Karlos

Quote:

Karlos wrote:
@cdimauro

Quote:

cdimauro wrote:

Have you made an RTL/HDL design (that was the discussion before) out of one of your architectures?


No. While I've messed with hardware, I'm really not a hardware designer. I don't think my designs would necessarily lend themselves well to hardware implementation:
- They tend to be based on bytecode enumerations rather than individual bits having any specific meaning. While this is good for software realisations, I expect this is a poor match for actual hardware logic.

Indeed. Your ISA was thought for being easy to decode and execute on an host processor. Which is good for your goals (like WASM, for example).

But for an hardware implementation the design should be quite different.
Quote:
- They don't generally have any condition codes. I started that way but soon stopped because the code that tends to get written doesn't need them until there's a branch and it's generally a simpler proposition to have a "compare and branch" instruction than maintain a ton of state you don't use most of the time. I expect this is also a poor fit for hardware because I can envisage it's quite easy to route ALU signals from any operation into a bit position of some CC register - effectively something you get "for free". Some architectures, e.g. PPC have variations that do or do not update the CC but I've yet to see any real hardware that doesn't rely on condition codes for implementing branching.

Correct. But if you implement the condition codes on your ISA then the performance could also get much morse because you've to calculate the flags for each arithmetic operation. Unless you provide equivalents that do not alter the CC, but you have a limited free space on your opcode table.

Anyway, there are many architectures that have no CC and have good performances. RISC-V is the last one...
Quote:

Karlos wrote:
The code density improvements to mc64k are now merged. The Mandelbrot test shows around a 42% performance improvement. Pretty much all the arithmetic in the main loop is now using one of the 2-byte register to register operations.

Which explains the gain, IMO.
Quote:
https://github.com/IntuitionAmiga/MC64000/blob/main/assembler/test_projects/mandelbrot/src/sp_register.s

Think about a version with ternary instructions.

 Status: Offline
Profile     Report this post  
kolla 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 21:30:10
#190 ]
Elite Member
Joined: 20-Aug-2003
Posts: 2310
From: Trondheim, Norway

To prove this is the real Gunnar is easy enough… just ask which decade Amiga was released.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
QBit 
Re: The (Microprocessors) Code Density Hangout
Posted on 28-Sep-2022 22:28:52
#191 ]
Regular Member
Joined: 15-Jun-2018
Posts: 247
From: Unknown

@all

C`mon go and have some Sex,, anybody has Sex except of you fools!

Make Love not War!

You will end up in a world of machines and the kids will be the fools controlled by that machines!

I should play in the dirt again like when I was a little kid. Hunting Frogs and blow them up! *lol*

I know this is sadistic.. but world is sadistic too. For 30 Years the Amiga Scene waited for a comeback
of the Amiga! And all the world says is #### YOU!

War is going on in Europe, If I won the Lottery I would go to Portugal.
The Island Madeira ..
There is nothing interesting to nuke except some trees, flowers, and houses!

Technology is complete shit!

Last edited by QBit on 28-Sep-2022 at 10:54 PM.
Last edited by QBit on 28-Sep-2022 at 10:53 PM.
Last edited by QBit on 28-Sep-2022 at 10:39 PM.
Last edited by QBit on 28-Sep-2022 at 10:38 PM.
Last edited by QBit on 28-Sep-2022 at 10:32 PM.

 Status: Offline
Profile     Report this post  
bhabbott 
Re: The (Microprocessors) Code Density Hangout
Posted on 29-Sep-2022 4:23:26
#192 ]
Regular Member
Joined: 6-Jun-2018
Posts: 227
From: Aotearoa

@cdimauro

Quote:

cdimauro wrote:

No. You stated something different before which is against what Matt said:

Vampires have at least 120MB of RAM, so a few kB here and there is nothing.


To justify increases on the code size due to the prefixes usage.

I also said "If the prefixed instruction executes much faster and is smaller than the vanilla code it replaces, it's still worth it.". It's obvious (to me at least) that smaller code fits in the cache better and so can run faster.

I can't find much info on the 68080's caching system - all I know is that it apparently holds 'decoded instructions'. Perhaps in that state the prefix doesn't take up any extra space in the cache, in which case there is no penalty apart from a tiny increase in main memory usage.

Quote:
Which wasn't the case. Actually the problem wasn't about seeking for the perfection, rather that Commodore MISSED experienced engineers which did chipsets design, after that Jay Miner and some other guy went out from the company.

According to many Amiga fans Jay Miner designed the Amiga wrong. Did we really want him on the team? The people working on AAA were experienced engineers. The problem was they set their sights too high - not just engineering-wise but regarding OS support and marketing too. Even if they had finished the design it would have been expensive and required a lot of work to support in the OS. They were trying to give the Amiga high-end workstation-like features at a price what wasn't viable, to compete in a market that was being taken over by PCs. Commodore's big mistake was in entertaining this over-ambitious engineering vision for too long.

Quote:
That's why it took so long to have just the ECS after so many years from the first chipset: the engineers which remained needed time to get used to this new job.

Not sure if it's true but I read that they lost the plans to OCS and had to reverse-engineer it. Thanks Jay! But that doesn't explain why they added productivity mode to ECS rather than expand its gaming capabilities more. Then they put it in the A3000, which already had a flicker fixer! The truth is their vision was the problem, not their engineering skills.

Quote:
In the meanwhile the competitors quickly filled the gap and went on advantage...

Not sure which competitors you are talking about. In 1990 when ECS was released the Amiga's main competition was games consoles like the Sega Mega Drive, 8 bit home computers like the C64, and the Atari ST which was the nearest 16 bit equivalent. The Amiga was perceived as slightly less capable than 16 bit consoles for arcade games, but much better for more sophisticated genres and far more desirable than the ST and other home computers.

But if you mean the PC then the Amiga was never really a contender, since it wasn't IBM compatible - a fact that Commodore was acutely aware of from the start but never really addressed (Sidecar and Bridgeboards don't count). IMO they shouldn't have even tried. They lost the plot when they produced the A3000 in an attempt to compete head-to-head with high-end PCs, when they should have just kept producing the A2000 line to satisfy high-end users and concentrated on the low-end home hobbyist / gaming market for sales volume. But the engineers weren't interested in games - until it was too late.

Quote:
That's not true. 8086's prefixes were, and are, used to:
- override the default segment for accessing the source or destination memory location (only the source for memory-to-memory instructions);
- signal the bus lock for some instructions which do read-modify-write operations on memory;
- repeat string operations.

- Override the default segment - would not be needed if it had plenty of 32 bit address registers.
- repeat string operations - are done with dbcc on 68k, viable because there are plenty of data registers available (not so on 8086, you must use CX).
- bus lock - Guessing its usage is rare - I never needed it in my 8086 code.

Quote:
This unless you were talking about the REX prefix usage on x86-64 to access the additional 8 registers.

I gave up programming PCs in assembly language a long time ago - too painful and pointless for Windows apps. I fully admit to knowing nothing about x86-64 machine code past 8086 - and don't want to know.

Quote:

Quote:
Opcodes referencing 8 bit registers were extended to 2 bytes as it was considered that 16 bit registers should get the shorter opcodes.

Again, this isn't the case: instructions using 8 or 16 bit data have the same encoding and lengths on 8086 and 8088.

Sorry, you are right - seems I misremembered. Perhaps it was execution time I was thinking of.

Quote:
Quote:
This made the 8088 slower than a Z80 running equivalent code at the same clock speed.

Any benchmark for this?

My premise was wrong. Nevertheless...

Zilog Z80
Quote:
in general the 8088 is a more powerful processor, but in many important cases it is slower than the Z80. Consider the following table which shows some typical cases where the Z80 is faster...

It can easily be noticed that the Z80 is faster on the 8080 instructions which set program counter or make a memory access. Especially noticeable is the advantage of the Z80 at executing far conditional jumps. With the 8088, the offset at conditional jumps is only one byte, so when you need a farther jump you have to write two commands. The Z80 does not have such a problem, there are always two bytes allocated for a jump, and therefore in such cases a conditional jump for the Z80 is much faster by 6 or 9 clock cycles. Almost all instructions using the HL register are performed in the Z80 a little faster, and this includes addition, subtraction, comparison, BIT and other logical operations.



 Status: Online!
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 29-Sep-2022 5:13:05
#193 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@bhabbott

Quote:

bhabbott wrote:
@cdimauro

Quote:

cdimauro wrote:

No. You stated something different before which is against what Matt said:

Vampires have at least 120MB of RAM, so a few kB here and there is nothing.


To justify increases on the code size due to the prefixes usage.

I also said "If the prefixed instruction executes much faster and is smaller than the vanilla code it replaces, it's still worth it.". It's obvious (to me at least) that smaller code fits in the cache better and so can run faster.

But you missed the context: it was about using the extra features of the core, 64-bit for GP registers as the most important thing (the extra registers are the secondary one). This implies using the prefix, which makes the code much fatter.
Quote:
I can't find much info on the 68080's caching system - all I know is that it apparently holds 'decoded instructions'. Perhaps in that state the prefix doesn't take up any extra space in the cache, in which case there is no penalty apart from a tiny increase in main memory usage.

The cache just caches the instructions as they are, so the space cannot shrink. It can only increase by adding tag bits (see Intel's and AMD's approach on saving instructions begin/end markers).
Quote:
Quote:
Which wasn't the case. Actually the problem wasn't about seeking for the perfection, rather that Commodore MISSED experienced engineers which did chipsets design, after that Jay Miner and some other guy went out from the company.

According to many Amiga fans Jay Miner designed the Amiga wrong.

Correct.
Quote:
Did we really want him on the team?

Considered the long delivery times and the other mistakes made, Jay Miner was way better to have.
Quote:
The people working on AAA were experienced engineers. The problem was they set their sights too high - not just engineering-wise

Well, engineering-wise was already enough: they were experienced (which wasn't the case: see the ECS released only on 1990) but failed to recognize the complexity of the project...
Quote:
but regarding OS support

Which wasn't their duty, right? BTW, RTG was already available at the time. So, nothing impossible to have in reasonable time. Worst case: buy it from the third-party, like Commodore did with other technologies (narrator.device, AREXX).
Quote:
and marketing too.

Since when engineers should care about marketing?!?
Quote:
Even if they had finished the design it would have been expensive

Weren't they experienced engineers?!?
Quote:
and required a lot of work to support in the OS.

See above.
Quote:
They were trying to give the Amiga high-end workstation-like features at a price what wasn't viable, to compete in a market that was being taken over by PCs. Commodore's big mistake was in entertaining this over-ambitious engineering vision for too long.

So and in short: those experienced engineers completely failed...
Quote:
Quote:
That's why it took so long to have just the ECS after so many years from the first chipset: the engineers which remained needed time to get used to this new job.

Not sure if it's true but I read that they lost the plans to OCS and had to reverse-engineer it.

Ask Reneé for them: she found and published them. So, they weren't lost, apparently.
Quote:
Thanks Jay! But that doesn't explain why they added productivity mode to ECS rather than expand its gaming capabilities more. Then they put it in the A3000, which already had a flicker fixer! The truth is their vision was the problem, not their engineering skills.

Hey, didn't said that they were experienced? So, only for implementing the chip's logic?

It looks like that experienced engineer without the right vision causes problems: what news!
Quote:
Quote:
In the meanwhile the competitors quickly filled the gap and went on advantage...

Not sure which competitors you are talking about. In 1990 when ECS was released the Amiga's main competition was games consoles like the Sega Mega Drive, 8 bit home computers like the C64, and the Atari ST which was the nearest 16 bit equivalent. The Amiga was perceived as slightly less capable than 16 bit consoles for arcade games, but much better for more sophisticated genres and far more desirable than the ST and other home computers.

Consoles had tiles and sprites, so it was much easier to write games.

The Amiga wasn't really a game console, because it lacked several important features. This caused coders headache trying to squeeze the hardware.
Quote:
But if you mean the PC then the Amiga was never really a contender, since it wasn't IBM compatible -

Neither Macs, Atari ST, Archimedes, but... they were all fighting on the desktop market.
Quote:
a fact that Commodore was acutely aware of from the start but never really addressed (Sidecar and Bridgeboards don't count). IMO they shouldn't have even tried.

Why not? It was an additional market for them. Remember: Commodore had to make money. In fact, it sold PCs as well after that the Amiga was released...
Quote:
They lost the plot when they produced the A3000 in an attempt to compete head-to-head with high-end PCs, when they should have just kept producing the A2000 line to satisfy high-end users and concentrated on the low-end home hobbyist / gaming market for sales volume. But the engineers weren't interested in games - until it was too late.

So, engineers continue to make mistakes with lack of vision.

OK, it might be, but... what about their managers? They had even shorter vision then, because they allowed all of that...
Quote:
Quote:
That's not true. 8086's prefixes were, and are, used to:
- override the default segment for accessing the source or destination memory location (only the source for memory-to-memory instructions);
- signal the bus lock for some instructions which do read-modify-write operations on memory;
- repeat string operations.

- Override the default segment - would not be needed if it had plenty of 32 bit address registers.

We were talking about the 8086: no wishful thinking here, eh!
Quote:
- repeat string operations - are done with dbcc on 68k, viable because there are plenty of data registers available (not so on 8086, you must use CX).

See above: wasn't it about 8086? Why did you talk about the 68k now?
Quote:
- bus lock - Guessing its usage is rare - I never needed it in my 8086 code.

Of course: it executed just single tasks with DOS et similar and you had no need to contend resources.
Quote:
Quote:
- Opcodes referencing 8 bit registers were extended to 2 bytes as it was considered that 16 bit registers should get the shorter opcodes.
Again, this isn't the case: instructions using 8 or 16 bit data have the same encoding and lengths on 8086 and 8088.

Sorry, you are right - seems I misremembered. Perhaps it was execution time I was thinking of.

Well, that it's obvious: 16-bit memory operations required much more time on 8088 with its 8-bit data bus.
Quote:
Quote:
- This made the 8088 slower than a Z80 running equivalent code at the same clock speed.
Any benchmark for this?

My premise was wrong. Nevertheless...

Zilog Z80
Quote:
in general the 8088 is a more powerful processor, but in many important cases it is slower than the Z80. Consider the following table which shows some typical cases where the Z80 is faster...

It can easily be noticed that the Z80 is faster on the 8080 instructions which set program counter or make a memory access. Especially noticeable is the advantage of the Z80 at executing far conditional jumps. With the 8088, the offset at conditional jumps is only one byte, so when you need a farther jump you have to write two commands. The Z80 does not have such a problem, there are always two bytes allocated for a jump, and therefore in such cases a conditional jump for the Z80 is much faster by 6 or 9 clock cycles. Almost all instructions using the HL register are performed in the Z80 a little faster, and this includes addition, subtraction, comparison, BIT and other logical operations.


Which are just some cases. It would be interesting to take a look at the common operations and compare their timings.

But, even better, to have benchmarks executed.

 Status: Offline
Profile     Report this post  
Karlos 
Re: The (Microprocessors) Code Density Hangout
Posted on 29-Sep-2022 6:29:09
#194 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3118
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@cdimauro

Quote:
Think about a version with ternary instructions


Some form of conditional move operation is definitely in the plan.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: The (Microprocessors) Code Density Hangout
Posted on 29-Sep-2022 7:28:47
#195 ]
Regular Member
Joined: 25-Sep-2022
Posts: 152
From: Unknown

@cdimauro

Again:
Quote:

If are an Amiga and CPU experts, where are the many super demos and games you wrote in the last 5 years?


The question was what have you written for Amiga in the last 5 years?


1) Your claim that you did participate in a game in the 90th does not answer the question.
2) No one can today proof how much you did there.

 Status: Offline
Profile     Report this post  
Gunnar 
Re: The (Microprocessors) Code Density Hangout
Posted on 29-Sep-2022 9:08:02
#196 ]
Regular Member
Joined: 25-Sep-2022
Posts: 152
From: Unknown

@cdimauro

Quote:
For years you posted your TINA nonsense 400Mhz/800Mips with 128bit memory bus
Reading the FPGA Specsheet and reading there its maximum clockrate, or max memory bus width
would have taken literally only minutes.


The Altera manual states clearly :
the maximum clock rate for all FPGA memory blocks is 238 MHz
the maximum clock rate for the FPGA multipliers is 200 MHz.
Very clearly the 400MHz you claimed is impossible.

The Altera manual also clearly states that the maximum supported external memory is 32bit.
While you claimed to connect 128 bit wide external memory.


For several years you "promoted" your project and claimed technically impossible values.

Either you on purpose posted the vapor claims.
Or you did not care to invest even a minute to read an verify some facts.

We can go over hundred of your posts here ... and see the same.
You post claims without understanding and without verifying the facts.

 Status: Offline
Profile     Report this post  
Hammer 
Re: The (Microprocessors) Code Density Hangout
Posted on 30-Sep-2022 2:22:14
#197 ]
Elite Member
Joined: 9-Mar-2003
Posts: 4595
From: Australia

@cdimauro

Quote:
I
t depends on how you create the ISA. For mine I was inspired by the work that Stephen Morse did with Intel's 8086.

The 8086 was successful because it was almost 100% source-level compatible with 8085, so porting the existing applications from the latter was very simple and effective.

I know perfectly that the software library is very very important and that's why my primary goal with NEx64T was to have 100% assembly-level compatibility with IA-32 and x86-64.

This means that usually a recompilation is enough to get a binary for my architecture, with the exceptions of applications that make assumptions about the instruction's opcode structure (assemblers, compilers, debugger, JIT compilers).

However and since any IA-32/x86-64 instruction could be usually mapped to a corresponding one on NEx64T, adapting the more difficult applications is quite simple.

Wintel desktop world doesn't tolerate Motorola 68K's instruction set bastardization i.e. "to be or not be" instruction set.

Both Z80 and 8085 are not X86.

Zilog Z80 is a software-compatible extension and enhancement of the Intel 8080 and it failed i.e. Z80 was defeated by Intel X86.

Zilog Z8000 and Z80000 weren't binary compatible with the Z80 and they both failed.

AMD's X86-64 is a software-compatible extension and enhancement of Intel's IA-32 with Microsoft being the kingmaker.

Unlike Intel IA-64 Itanium, AMD's X86-64 doesn't compromise IA-32's legacy runtime performance which is important for the PC gaming market.

IA-32 has two upgrade paths i.e. Intel's IA-64 and AMD's X86-64. Intel IA-64 is garbage at PC gaming.


You can try to implement your NEx64T with Transmeta style Code Morph Software (CMS) translation that is similar to PiStorm/RPi 3A+/Emu68 method as part of the retro X86 scene i.e. NEx64T with CMS competes against AO486 FPGA port for MiSTer FPGA.


ColdFire V1 to V4 is source code compatible with 68K and it's still unacceptable for Amiga users who have accumulated WHDLoad 68K games. I rather have AC68080 or PiStorm/Emu68 solutions over ColdFire V4.



My main reason for PiStorm/Emu68 is its relatively low price and respect for 68K Amiga legacy which includes accumulated WHDLoad 68K games. I'm okay with Firebird V4 since it's better than Vampire V2+.

----

AMD's AVX3-512 for Zen 4 is "double pumped" over multiple 256-bit units. Zen 3 has six FPU units.

Binary re-compilation would break DRM /Anti-cheat certificate validation schemes, hence breaking PC games. This is why Valve worked with DRM /Anti-cheat middleware vendors to accept Valve's complied Proton/DXVK certificate for SteamOS. Open source Proton/DXVK is nearly meaningless for PC gaming without passing Microsoft's or Valve's valid certificate checks.

Valid certificate checks help enforce the PC's trusted computing initiative.

Last edited by Hammer on 30-Sep-2022 at 04:46 AM.
Last edited by Hammer on 30-Sep-2022 at 04:35 AM.
Last edited by Hammer on 30-Sep-2022 at 02:37 AM.
Last edited by Hammer on 30-Sep-2022 at 02:33 AM.

_________________
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 3080 Ti
Ryzen 9 7900X, DDR5-5600 32 GB RAM, GeForce RTX 3080 Ti
Amiga 1200 (rev 1D1, KS 3.2, TF1260, 68060 @ 63 Mhz, 128 MB)
Amiga 500 (rev 6A, KS 3.2, PiStorm/RPi3a/Emu68)

 Status: Offline
Profile     Report this post  
matthey 
Re: The (Microprocessors) Code Density Hangout
Posted on 30-Sep-2022 3:56:26
#198 ]
Super Member
Joined: 14-Mar-2007
Posts: 1684
From: Kansas

@Gunnar
You created an account on Amigaworld.net to first attack me and then never respond to my questions? Does the truth hurt so bad and my arguments are so good based on data that you can't rationalize your decisions or provide data why you made your decisions? Isn't it hypocritical that you attack cdimauro who isn't a hardware guy when you, somebody who worked professionally on CPU hardware, can't justify your bad CPU core design decisions?

Recall the paper I recently posted that showed code size could be increased by 17% due to REX prefixes alone. From x86 to x86-64, code size for the SPEC CPUint2006 benchmark increased 21%. Most of that increase is very likely due to REX prefixes. The data cache request rate decreased by 28% while the instruction cache request rate increased by 14%. Another paper showed a 42% increase in memory references from 16 to 8 GP registers. The REX prefix is a good tradeoff which reduces overall memory traffic to move from 8 GP registers to 16. There was no easy way to encode more registers without major changes to the encodings and the prefix allowed to add 64 bit sizes all with only one byte of code increase. Increasing past 16 GP registers is very likely only going to reduce data cache traffic by a low single digit percentage while a REX prefix increased instruction cache traffic by ~17%. Data cache accesses are generally less predictable than instruction cache accesses but variables in memory are usually in a stack frame or on the stack which provide a high cache hit rate. This high predictability of mem variables coupled with a dual ported data cache (common even on x86) may explain why so little performance is gained from x86 with 8 GP registers to x86-64 with 16 GP registers despite the huge reduction in data accesses. CISC is not handicapped when out of GP registers but gracefully transitions to using reg-mem variables with minimal performance loss while RISC thrashes memory unloading and loading variables.

AMD claimed the average instruction length from x86 to x86-64 only increased from 3.4 to 3.8 bytes for SPECint2000 (https://old.hotchips.org/wp-content/uploads/hc_archives/hc14/3_Tue/26_x86-64_ISA_HC_v7.pdf). It's not difficult to find x86-64 programs with average instruction lengths over 4 bytes though. Cdimauro's "integer" instructions for Photoshop show an increase from 3.2 to 4.3 bytes. With a 4 byte average, an ISA can have 32 GP registers. If compatibility wasn't so important, it would have been better to start over with a better ISA than x86 and change to a 16 bit encoding base. The 68k is in much better shape. A 64 bit mode allows to clean the encoding map up nicely without major decoding changes. A prefix is following the x86-64 disaster and it costs 2 bytes instead of one for the 68k. If the 68k average instruction length is 3 bytes, an average instruction would increase to 5 bytes with a prefix. The common 2 byte instruction would increase to 4 bytes. It's easy to say that more than 16 GP registers is rarely used so the prefix contributes little to code size and instruction size increases but then if it is so rarely used then why have all these extra integer registers? It's also easy to say that 64 bit operations requiring a 16 bit prefix are rarely used which they probably are now but do you want a gimp 64 bit ISA like x86-64 that needs a prefix for 64 bit operations? Do you want poor code density, longer instructions and more decoding overhead like x86-64 or a lean and mean 64 bit 68k ISA with one of the best possible 64 bit code densities?

I've always been honest with you. If you are looking for someone that tells you what you want to hear then I'm not that person. I know where I'm not wanted. I didn't sabotage your project though. You are doing a good enough job of that by yourself. At least your project is in FPGA so mistakes can be corrected if you are willing to admit them. Admitting mistakes is more manly than attacking others for their mistakes.

 Status: Offline
Profile     Report this post  
Hammer 
Re: The (Microprocessors) Code Density Hangout
Posted on 30-Sep-2022 4:31:03
#199 ]
Elite Member
Joined: 9-Mar-2003
Posts: 4595
From: Australia

@bhabbott

Quote:

Not sure which competitors you are talking about. In 1990 when ECS was released the Amiga's main competition was games consoles like the Sega Mega Drive, 8 bit home computers like the C64, and the Atari ST which was the nearest 16 bit equivalent. The Amiga was perceived as slightly less capable than 16 bit consoles for arcade games, but much better for more sophisticated genres and far more desirable than the ST and other home computers.

But if you mean the PC then the Amiga was never really a contender, since it wasn't IBM compatible - a fact that Commodore was acutely aware of from the start but never really addressed (Sidecar and Bridgeboards don't count). IMO they shouldn't have even tried. They lost the plot when they produced the A3000 in an attempt to compete head-to-head with high-end PCs, when they should have just kept producing the A2000 line to satisfy high-end users and concentrated on the low-end home hobbyist / gaming market for sales volume. But the engineers weren't interested in games - until it was too late.


ECS's high-resolution modes are mid-80s low colors business resolution product development mentality.

ECS's design mentality is the same as Commodore 128's aging C64 gaming hardware with high resolution/low color count "business" modes.

Amiga 3000's flicker fixer with a frame buffer was an attempt to catch up to IBM VGA's flicker-free 16-color 640x480 business resolution mode.

The problem, IBM has XGA, and 8514 has high "business" resolution with high color display modes. Lower cost 8514/A clones such as ET4000AX ISA reached $129 USD (ref 1) in 1992.

Commodore wasted A3000's 32-bit Chip RAM bandwidth increase with ECS while C65's 256 color display with 4096 color palette capable chipset was completed in December 1990.

Commodore engineers completed two 256-color display capable chipsets from Dec 1990 to March 1991 time scale.

My family owned ex-corporate Amiga 3000/030 @ 25 Mhz with KickStart 2.04 ROM variant in early 1992. Amiga ECS acted like a boat anchor and ditching a reasonably good 68030/68882 @ 25 Mhz CPU/FPU along with the entire Amiga 3000 wasn't good.

At the same as our Amiga 3000/030, my family also owned a 386DX-33 PC clone with 16-bit ISA slots replacing our earlier ex-corporate IBM PS/2 Model 55SX with 16-bit MCA slots.
Like many other users, my Dad was pissed-off with higher-cost addon cards for IBM MCA and that was the last desktop IBM PC in our family.

386DX-33 PC clone with ET4000AX served my gaming in place of A1200 AGA until 1996 Quake.

According to Dave Haynie, the AGA was completed in March 1991 and Bill Sydnes (of IBM PC Jr) focused on aging ECS Amigas. Dave Haynie advocated for A3000 AGA motherboard upgrades and Commodore management disagreed.

Amiga 3000 has 32-bit Zorro slots that competed against PC's 386DX 32-bit EISA and 486's 32-bit VL-Bus. Amiga 2000's 16-bit Buster and 16-bit Zorro II was aging.

Amiga 1200's 32-bit Budgie (for 32-bit trap door expansion bus) was a scaled-down 32-bit Super Buster, hence Amiga 3000's development was important.

For Amiga chipset f**kup, focus on ex-IBM PC Jr project manager Bill Sydnes.

There are causes for my extreme dislike for both Motorola and IBM.

Reference
https://archive.org/details/bub_gb_hqQJaNzN9IcC/page/n603/mode/2up
1. PC Mag 1992-08, page 604 of 664,
Diamond Speedstar 24 (ET4000AX ISA) has $169



Last edited by Hammer on 30-Sep-2022 at 04:53 AM.
Last edited by Hammer on 30-Sep-2022 at 04:33 AM.

_________________
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 3080 Ti
Ryzen 9 7900X, DDR5-5600 32 GB RAM, GeForce RTX 3080 Ti
Amiga 1200 (rev 1D1, KS 3.2, TF1260, 68060 @ 63 Mhz, 128 MB)
Amiga 500 (rev 6A, KS 3.2, PiStorm/RPi3a/Emu68)

 Status: Offline
Profile     Report this post  
cdimauro 
Re: The (Microprocessors) Code Density Hangout
Posted on 30-Sep-2022 4:45:59
#200 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Karlos

Quote:

Karlos wrote:
@cdimauro

Quote:
Think about a version with ternary instructions


Some form of conditional move operation is definitely in the plan.

Which is good.

But no ternary instructions? Like fadd.s f0,f1,f2.

Looking at your Mandelbrot example it should gain a lot in terms of code density (those ternary instructions should use a 3 bytes encoding) and, especially, performance.

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle