Click Here
home features news forums classifieds faqs links search
6082 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
48 crawler(s) on-line.
 20 guest(s) on-line.
 1 member(s) on-line.


 BigD

You are an anonymous user.
Register Now!
 BigD:  2 mins ago
 Karlos:  20 mins ago
 CaptainFrank:  23 mins ago
 Amigo1:  31 mins ago
 V8:  35 mins ago
 BuLa:  37 mins ago
 pixie:  48 mins ago
 noXLar:  48 mins ago
 clusteruk:  1 hr 47 mins ago
 deadduckni:  1 hr 50 mins ago

/  Forum Index
   /  Amiga Development
      /  Packed Versus Planar: FIGHT
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 Next Page )
PosterThread
Hammer 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 7:44:35
#161 ]
Elite Member
Joined: 9-Mar-2003
Posts: 4600
From: Australia

@cdimauro

Quote:

LOL: how to overturn the reality.

Do you understand that the 68060 has NO "slow" instructions path? What you're talking about?!?

With 68060 support software, 68060 traps unsupported instructions and emulates them, hence they are on a slow path.

Refer to CyberPatcher's or Oxyron Patcher's purpose, but they are at AmigaOS level. Hint: Coming from the X86 world, they are not good enough.

I prefer Pi-Storm/Pi-3a/Emu68's below the OS level method.
I prefer X86-64's microcode firmware updates below the OS level method.

You haven't realized why I keep mentioning AMD?

1. AMD acted as credible second source insurance when Intel wanted to end X86 with IA-64 Itanium. Intel is on record that they wanted to kill the X86 just as Motorola killed 68K.

2. AMD didn't shorten X87 backward compatibility like Apollo Core's 68EC080's shorten FPU. AMD is proven to be loyal to X86, while Intel has proven to be disloyal to X86.

https://www.youtube.com/watch?v=jB9FrBWrbOA
Dave N. Cutler speech on AMD K8 x64 processor. Microsoft hated Intel Itanium. Microsoft's kingmaker move, adios Intel Itanium.

You can't read.

Quote:

And guess what: it made the 68060 INCOMPATIBLE with A LOT of existing software!

Apollo Core's 68EC080 follows Motorola's inferior software legacy protection tradition when compared to the X86 world.

You haven't realized why I keep mentioning AMD?

1. AMD acted as credible second source insurance when Intel wanted to end X86 with IA-64 Itanium. Intel is on record that they wanted to kill the X86 just as Motorola killed 68K.

You have forgotten Intel's Itanium adventure.

2. AMD didn't shorten X87 backward compatibility like Apollo Core's 68EC080's shorten FPU.

You can't read.

Quote:

But NOT the 68060. Go to Motorola and ask why they didn't provide backward-compatibile processors. No, not only the 68060: I mean ALL processors AFTER the 68000.

With 68000 contexts, not a major issue with WHDLoad game patches.

For A500, the pre-configured Pi-Storm/Pi-3a/Emu68's 32 GB MicroSD card is pre-loaded with many WHDLoad games. Amiga legacy games are easy to obtain.

I can switch from Witcher 508 with 68HC000 @ 50Mhz towards Pi-Storm/Pi-3a/Emu68.

I use AMD as the benchmark for respecting software legacy, and Intel has attempted to kill the X86 with Itanium.

Apollo Core AC68EC080 can be so much more than following Motorola's disrespecting 68K

In certain aspect, Apollo Core is better than Motorola, but Apollo Core are not on the same level as the two X86 CPU vendors. Apollo Core still has some Motorola mindset when it comes to respecting the 68K legacy.

My loyalty is for myself and my own interest, not AMD.



Last edited by Hammer on 13-Aug-2022 at 08:18 AM.
Last edited by Hammer on 13-Aug-2022 at 08:09 AM.

_________________
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 3080 Ti
Ryzen 9 7900X, DDR5-5600 32 GB RAM, GeForce RTX 3080 Ti
Amiga 1200 (rev 1D1, KS 3.2, TF1260, 68060 @ 63 Mhz, 128 MB)
Amiga 500 (rev 6A, KS 3.2, PiStorm/RPi3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 8:00:45
#162 ]
Elite Member
Joined: 9-Mar-2003
Posts: 4600
From: Australia

@cdimauro

Quote:
Sure. SOME software works, but not ALL.


You haven't realized why I keep mentioning AMD?

Quote:

Ah, no? And then please tell me: why you've posted a video that actually... shows it

For this topic.

https://www.youtube.com/watch?v=1B1jKjrRUmk
Doom on A1200/TF1230's 68030 @ 50 Mhz with AGA vs PC's 386 40 Mhz with ET4000.

Very similar performance - Buzzing Retro Computing

John Carmark's argument includes the install base context.

Quote:

I expect that a proper 68K processor has not problem executing ALL software which was developed BEFORE it was sold.

This is the meaning of backward-compatibility, which Intel followed and it's famous for.

But which lacked with ALL Motorola processors after the 68000.

Intel is on record that they attempted to replace X86 with IA-64 Itanium. Intel almost executed a Motorola 68K kill move on the X86, but AMD created X86-64 to extend X86 and killed Intel's Itanium adventure.

During the mainstream 64-bit desktop PC transition, IBM PowerPC 970, AMD's X86-64, and Intel IA-64 Itanium competed for 64 bit desktop computing domiance.

You have forgotten Intel's bad days.

My username for Amigaworld.net is named after AMD's K8 Hammer that smashed Intel Itanium and IBM PowerPC 970 from the desktop computing marketplace.

Intel's disloyalty is not forgotten.




_________________
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 3080 Ti
Ryzen 9 7900X, DDR5-5600 32 GB RAM, GeForce RTX 3080 Ti
Amiga 1200 (rev 1D1, KS 3.2, TF1260, 68060 @ 63 Mhz, 128 MB)
Amiga 500 (rev 6A, KS 3.2, PiStorm/RPi3a/Emu68)

 Status: Offline
Profile     Report this post  
kolla 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 9:39:43
#163 ]
Elite Member
Joined: 20-Aug-2003
Posts: 2315
From: Trondheim, Norway

If only all these pages of blablabla was code…

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Karlos 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 9:40:18
#164 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3144
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Hammer

Quote:
In certain aspect, Apollo Core is better than Motorola, but Apollo Core are not on the same level as the two X86 CPU vendors


Wow. It's almost as if Apollo is the output of a handful of enthusiasts from a tiny community and not from an existing behemoth with a huge install base it would be suicidal to break backwards compatibility with.

Who knew?

Quote:
My username for Amigaworld.net is named after AMD's K8 Hammer 

... but most people call me Georgio.

Last edited by Karlos on 13-Aug-2022 at 09:42 AM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
cdimauro 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 9:49:19
#165 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:

LOL: how to overturn the reality.

Do you understand that the 68060 has NO "slow" instructions path? What you're talking about?!?

With 68060 support software, 68060 traps unsupported instructions and emulates them, hence they are on a slow path.

So, it's NOT the 68060 which has this software, but EXTERNAL software!
Quote:
Refer to CyberPatcher's or Oxyron Patcher's purpose, but they are at AmigaOS level. Hint: Coming from the X86 world, they are not good enough.

Hint: it means that the 68060 (as well all other Motorola's 68K processors after the 68000) is NOT backward compatibile AND requires EXTERNAL software support.
Quote:
I prefer Pi-Storm/Pi-3a/Emu68's below the OS level method.

Who cares...
Quote:
I prefer X86-64's microcode firmware updates below the OS level method.

I reveal you a secrete: microcode firmware updates were introduced by Intel: not AMD.
Quote:
You haven't realized why I keep mentioning AMD?

1. AMD acted as credible second source insurance when Intel wanted to end X86 with IA-64 Itanium. Intel is on record that they wanted to kill the X86 just as Motorola killed 68K.

2. AMD didn't shorten X87 backward compatibility like Apollo Core's 68EC080's shorten FPU. AMD is proven to be loyal to X86, while Intel has proven to be disloyal to X86.

Again: WHO cares?!?

First, the 68080 supports ALL 68K FPU instructions of ALL processors. The ONLY thing is that it has implemented them with less precision (e.g.: double / FP64). Which means that almost all software runs without problems and only some could have some issues with results. But, anyway, the software runs WITHOUT any external support like Oxyron or WHDLoad. So, it's WAY MUCH BETTER than ANY Motorola's 68K processor!

Second, AMD had NO OTHER WAY than to stay on x86 and create another HORRIBLE PATCH over this ISA, because it had NO Itanium license. So, it was a bet, trying to survive. And it succeeded only because Itanium failed, otherwise we weren't here talking about it.
Quote:
https://www.youtube.com/watch?v=jB9FrBWrbOA
Dave N. Cutler speech on AMD K8 x64 processor.

Which says nothing.
Quote:
Microsoft hated Intel Itanium. Microsoft's kingmaker move, adios Intel Itanium.

Do you know that Microsoft SUPPORTED Itanium?
Quote:
You can't read.

Of course, because I've to hear it: it's a video!
Quote:
Quote:

And guess what: it made the 68060 INCOMPATIBLE with A LOT of existing software!

Apollo Core's 68EC080 follows Motorola's inferior software legacy protection tradition when compared to the X86 world.

You continue to report bullsh*ts: 68080 is by far the MOST COMPATIBLE 68K processor EVER: way BETTER than ANY Motorola's 68K.

In fact, it implements ALL GP and FPU instructions of ALL 68K processors. ALL!

So, the comparison with Motorola is an absolute non-sense, like the things that you continue to report like a parrot.
Quote:
You haven't realized why I keep mentioning AMD?

1. AMD acted as credible second source insurance when Intel wanted to end X86 with IA-64 Itanium. Intel is on record that they wanted to kill the X86 just as Motorola killed 68K.

See above.
Quote:
You have forgotten Intel's Itanium adventure.

Care to PROVE this, dear lair? You continue your mystification, because you're completely without arguments and you have to invent things and let dispatch them like I was tell them, which is a complete lie!

Quote me and PROVE that I've forgotten Itanium, LIAR!
Quote:
2. AMD didn't shorten X87 backward compatibility like Apollo Core's 68EC080's shorten FPU.

Well, who told that AMD had processors without a fully-compliant FPU? It was YOU, right?

Besides that, see above on the topic.
Quote:
You can't read.

Yes: I can't read the bullsh*ts and lies that your're continuously spreading.
Quote:
Quote:

But NOT the 68060. Go to Motorola and ask why they didn't provide backward-compatibile processors. No, not only the 68060: I mean ALL processors AFTER the 68000.

With 68000 contexts, not a major issue with WHDLoad game patches.

Red Herring: you're trying to change the topic.

This is an EXTERNAL support because those processors are NOT backward-compatible with the previously written software.
Quote:
For A500, the pre-configured Pi-Storm/Pi-3a/Emu68's 32 GB MicroSD card is pre-loaded with many WHDLoad games. Amiga legacy games are easy to obtain.

I can switch from Witcher 508 with 68HC000 @ 50Mhz towards Pi-Storm/Pi-3a/Emu68.

Irrelevant about the topic: your usual padding...
Quote:
I use AMD as the benchmark for respecting software legacy, and Intel has attempted to kill the X86 with Itanium.

See above: non-sense.
Quote:
Apollo Core AC68EC080 can be so much more than following Motorola's disrespecting 68K

LOL They IMPROVED 68K compatibility over time, and this should be "disrespectful" for the 68K software? You live in a parallel universe!
Quote:
In certain aspect, Apollo Core is better than Motorola,

Certain? It's VASTILY SUPERIOR! There's NO comparison AT ALL!
Quote:
but Apollo Core are not on the same level as the two X86 CPU vendors.

STRA-LOL: you're comparing a processor made by a small "garage" team, with two big companies which produce processors since 40 years and that the last year, summed together, had revenues for 100 BILLION dollars?!? You're crazy!!!
Quote:
Apollo Core still has some Motorola mindset when it comes to respecting the 68K legacy.

Absolutely not: they ADDED stuff whereas Motorola systematically REMOVED it. There's no absolute comparison. Only a foul can write those absurdities.
Quote:
My loyalty is for myself and my own interest, not AMD.

Then if you have still some respect left for yourself stop writing complete bullsh*t and lies, because you're reputation here become the same as of a clown...
Quote:

Hammer wrote:
@cdimauro

Quote:
Sure. SOME software works, but not ALL.

You haven't realized why I keep mentioning AMD?

I've realized that you're continuously changing the topic, since you're not able to sustain your bullsh*ts.

Anyway, could you please stop to behave like a parrot? Continuously repeating the same things will NOT let them became true: it only shows your limits...
Quote:
Quote:

Ah, no? And then please tell me: why you've posted a video that actually... shows it

For this topic.

https://www.youtube.com/watch?v=1B1jKjrRUmk
Doom on A1200/TF1230's 68030 @ 50 Mhz with AGA vs PC's 386 40 Mhz with ET4000.

Very similar performance - Buzzing Retro Computing

Again: parrot! I've replied several times on that. You're not able to see the differences on the video, and you report the opinion of another guy.

This because you're clearly limited: mother nature hasn't made a good work with you...
Quote:
John Carmark's argument includes the install base context.

Again, a different topic. Red Herring. You're hopeless...
Quote:
Quote:
I expect that a proper 68K processor has not problem executing ALL software which was developed BEFORE it was sold.

This is the meaning of backward-compatibility, which Intel followed and it's famous for.

But which lacked with ALL Motorola processors after the 68000.

Intel is on record that they attempted to replace X86 with IA-64 Itanium. Intel almost executed a Motorola 68K kill move on the X86, but AMD created X86-64 to extend X86 and killed Intel's Itanium adventure.

This is a completely different topic. You're the king of Red Herrings!
Quote:
During the mainstream 64-bit desktop PC transition, IBM PowerPC 970, AMD's X86-64, and Intel IA-64 Itanium competed for 64 bit desktop computing domiance.

See above, plus: when do you let me know about the instructions fusion that the PowerPC 970 was supposed (only by YOU) to perform?

You're showing that you're even much worse Mr. Fonzarelli (besides a complete ignorant on architectures & their history).
Quote:
You have forgotten Intel's bad days.

Care to prove it, dear LIAR?
Quote:
My username for Amigaworld.net is named after AMD's K8 Hammer that smashed Intel Itanium and IBM PowerPC 970 from the desktop computing marketplace.

Intel's disloyalty is not forgotten.

This is a further proof that you're a blind AMD's fanatic.

For the rest, see above my answers. BTW, Intel obliterated AMD for several years, after the K8 project, starting with the Core processors family.

And: http://apollo-core.com/index.htm?page=features "Enjoy"...

 Status: Offline
Profile     Report this post  
cdimauro 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 9:51:07
#166 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@kolla

Quote:

kolla wrote:
If only all these pages of blablabla was code…

Indeed. I've stolen a lot of time from working on my architecture. Damn...

@Karlos

Quote:

Karlos wrote:
@Hammer

Quote:
In certain aspect, Apollo Core is better than Motorola, but Apollo Core are not on the same level as the two X86 CPU vendors


Wow. It's almost as if Apollo is the output of a handful of enthusiasts from a tiny community and not from an existing behemoth with a huge install base it would be suicidal to break backwards compatibility with.

Who knew?

He...
Quote:
Quote:
My username for Amigaworld.net is named after AMD's K8 Hammer 

... but most people call me Georgio.

That was really fine. I don't know how many can get it. But I really liked! Kudos!

 Status: Offline
Profile     Report this post  
Bosanac 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 12:10:08
#167 ]
Regular Member
Joined: 10-May-2022
Posts: 210
From: Unknown

 Status: Offline
Profile     Report this post  
Karlos 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 12:19:48
#168 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3144
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

I haven't seen a thread evolve like this for a while...

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 12:45:25
#169 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3144
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Motorola butchered the 6888x to get it into the 68040. For the most part I'd say what they dropped from the ISA to do it was mostly sensible. The issue is that the trap and emulate solution provided for backwards compatibility hammered performance in code reliant on the missing 6888x operations. They also dropped some useful rounding operations that were reintroduced in the 68060 FPU.

They also made breaking changes from the 68000 to the 68010 in order to properly segregate the user and supervisor modes (no more accessing the full status register). There were breaking changes in every major iteration if you know where to look.

Try running AmigaOS 3.x on an 040 or 060 without any corresponding 68040/68060.library and feel the compatibility.

There's no question their approach to backwards compatibility was completely different than the one taken in the x86 world.

Last edited by Karlos on 13-Aug-2022 at 01:07 PM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
cdimauro 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 15:39:28
#170 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Karlos: I fully agree. Of course.

Using tricks like trapping unimplemented instructions, ad-hoc math libraries (which don't trigger any trap), or patching the executables to make them compatible to the host processo are all things which just prove that the processor is NOT backward-compatible with the existing software. Full stop.

Motorola did a very bad job regarding backward-compatibility, and not only for the 68K family (PowerPC is affected as well).

 Status: Offline
Profile     Report this post  
NutsAboutAmiga 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 17:31:58
#171 ]
Elite Member
Joined: 9-Jun-2004
Posts: 12344
From: Norway

@cdimauro

Quote:
Motorola did a very bad job regarding backward-compatibility, and not only for the 68K family (PowerPC is affected as well).


Absolutely, for C/C++ code generic code maybe not major issue, but then you want hand optimize a part, to make it faster, but you quickly find out the instruction is not available, or its dropped in another CPU, spending the time, to actually optimize something, becomes discouraging. When know you maybe get 20% speed increases, but on another CPU you get 200% decrees in speed.

Last edited by NutsAboutAmiga on 13-Aug-2022 at 05:32 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

 Status: Offline
Profile     Report this post  
Karlos 
Re: Packed Versus Planar: FIGHT
Posted on 13-Aug-2022 19:30:07
#172 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3144
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

Having said all that, it's interesting to see how the processors changed over time when not overly concerned with backwards compatibility.

In any case, the Apollo having broken compatibility with the 68060 iFPU is continuing in the established tradition of sticking a finger up at earlier iterations.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
matthey 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 3:07:15
#173 ]
Super Member
Joined: 14-Mar-2007
Posts: 1684
From: Kansas

Karlos Quote:

Having said all that, it's interesting to see how the processors changed over time when not overly concerned with backwards compatibility.


Up to the 68030/80386, 68k development was competitive and strait forward with incremental improvements. The 386 removed enough of the 808x limitations and this is when the IBM clone market accelerated. Motorola wanted to remain competitive but revenue and desktop margins were higher for x86 CPUs while the 68k was losing their best margin workstation market (the 68k embedded market was growing but lower margin). Motorola tried to move ahead of the 80486 with the 68040. The 68040 added one stage to the pipeline compared to the 80486 which improved performance (it had better performance/MHz) but this uses more transistors. The deeper pipeline should have allowed the 68040 to be clocked higher but more active transistors produce more heat which was a problem using the fab processes back then. The heat was a major problem for the embedded market where additional cost is needed for cooling, power supplies and electricity bills. To pay for the transistors used in the extra pipeline stage, adding a full 6888x FPU may have increased the area enough to require a much more expensive fab process and chip cost is very important for the embedded market as well. Castrating the 68040 FPU down to the most common instructions seems reasonable to me although the elimination of FINT/FINTRZ was a huge mistake and makes me wonder how much FPU code they analyzed before deciding what to keep (FINTRZ is very common as used for the default C rounding). The 68040 was a disappointment primarily due to the heat but the FPU was disappointing and the full performance of the deeper pipeline was not fully realized due to the heat and lack of better branch prediction. The 80486 also ran hot which created problems and had inferior performance but better margins and revenue from the significantly larger PC market compared to the practically 68040 Mac market at this point meant that the 80486 received chip die shrinks that made it run cooler and eventually allowed it to be clocked up more. Motorola eventually created a 3.3V and fully static 68040V which ran much cooler and could be clocked down all the way to zero which is useful in the embedded market where it saw some success but the embedded market was not profitable enough for Motorola to die shrink chips like Intel.

1991 microprocessor market share by volume
1) Zilog 20%
2) Intel 18.6%
3) Motorola 14.4%

1991 microprocessor market share by revenue
1) Intel 64.3%
2) Motorola 9.3%
3) AMD 8.4%

Motorola is now looking at building the next generation CPU after the 68040. Even before the PowerPC alliance, the market reality is that the next generation needs to be better suited for the high end embedded market than the 68040. Performance should be high enough to be appealing for the desktop market as well. The 68060 moved to 3.3V, became superscalar for more parallelism, increased the pipeline length all the way to 8 stages while providing much improved branch prediction and branch folding which greatly improved loop performance. The base 68060 design was amazing but now we need to lower the cost for embedded. I have some literature that says the 68060 was to have a 2.8 million transistor budget but the AIM PowerPC alliance was formed in October of 1991 which is early enough that PPC became the Motorola next generation and the 68060 was designated to embedded only where the transistor budget may have been reduced. Sadly, the 68060 went to the chopping block which included integer instructions this time.

MC68060 User's Manual Quote:

DIVU.L ea,Dr:Dq 64/32=>32r,32q
DIVS.L ea,Dr:Dq 64/32=>32r,32q
MULU.L ea,Dr:Dq 32*32=>64
MULS.L ea,Dr:Dq 32*32=>64
MOVEP Dx,(d16,Ay) size=W,L
MOVEP (d16,Ay),Dx size=W,L
CHK2 ea,Rn size=B,W,L
CMP2 ea,Rn size=B,W,L
CAS2 Dc1:Dc2,Du1:Du2,(Rn1):(Rn2) size=W,L
CAS Dc,Du,ea size=W,L misaligned ea


The only instructions that a compiler generates are the 64 bit MUL and DIV instructions which would be my number one choice to keep and probably could have been kept with a 2.8 million transistor budget instead of 2.5 million (1994 P54C Pentium had 3.3 million transistors). MOVEP is used by some 68000 code including games but was useless on the 68020+ and poorly implemented. CHK2 and CMP2 are rare but may be used by specialized 68020+ programs. The Amiga Hardware Reference Manual says not to use CAS and CAS2 although they may be reliable in fast memory. There were a few very rarely used FPU instructions removed from the 68040 FPU for the 68060 but the addition of the very common FINT/FINTRZ is overall a big improvement and what the 68040 should have used as far as FPU instructions. The return of FINT/FINTRZ despite all the castrating shows just how bad of a mistake removing this instruction was on the 68040. Castrating instructions from the ISA certainly looked bad for desktop use especially when they were poorly chosen like FINT/FINTRZ and the 64 bit MUL and DIV instructions but I would have argued for keeping them even for embedded use.

I hope everyone can see why the 68040 and 68060 needed to have competitive performance while lowering cost and power for the embedded market. The 68060 was an overachiever despite compromises. The balanced 68060 design competed with the higher performance Pentium design which had 32% more transistors for the P54C, a 64 bit data bus instead of 32 bit, larger instruction fetch etc., yet the 68060 has superior integer performance at the same clock speed and should have been capable of clocking higher due to the much deeper pipeline. The FPU is not fully pipelined saving transistors but this is a good compromise as FPU instructions are partially pipelined in the integer units, the deeper integer pipeline should have allowed for higher clock rates improving both integer and floating point performance and floating point is more common for desktop use than embedded use. The 68060 FPU performance is nearly as good as the Pentium for the most common mixed integer floating point workloads and would likely have been ahead in real world FPU performance had the 68060 leveraged the advantage of the much deeper pipeline to out clock the Pentium. Of course the trapped instructions needed to be avoided and requires good compilers which the 68060 rarely received. Vbcc generated FPU code outperformed GCC and SAS/C code once those trapped FPU instructions were removed and nearly matched Pentium performance at the same clock in some benchmarks. No compiler I am aware of performs instruction scheduling for 68060 code which is important for in-order performance and makes the 68060 outperforming the Pentium that much more impressive. I believe the 68060 was the Pentium killer that Motorola needed to regain desktop and laptop market share while being a great embedded CPU but it was demoted to a high end embedded processor only and hidden away in the Motorola basement.

68060 ByteMark benchmarks
https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=44391&forum=25#847418

Karlos Quote:

In any case, the Apollo having broken compatibility with the 68060 iFPU is continuing in the established tradition of sticking a finger up at earlier iterations.


Where Motorola needed to make compromises for embedded, the Apollo team, really just Gunnar who makes all the decisions, chose to make compromises for an affordable FPGA and higher performance.

Someone asked what the Precision of the Apollo Core FPU is and received the following answer from Gunnar.

Gunnar von Boehn Quote:

R5 release supports 56 Bit FPU precision.
The upcoming R6 release support 64 bit.


I'm guessing that is not fraction bits as 64 bits of fraction is full extended precision. So this is probably 64 total bits with 15 bits kept for extended precision exponent?

64(total)-15(exponent)-1(sign)=48(fraction)

Double precision has 53 bits if counting the hidden bit so it is still inferior to the WinUAE default. The WinUAE compromise isn't too bad because the default 68k AT&T SysV (Unix) ABI doesn't retain more than double precision when passing function arguments on the stack (the same is true for the x86 ABI and FPU). Too fully take advantage of an extended precision FPU while providing good performance, an optimum ABI would pass floating point arguments in registers with any that don't fit placed on the stack in extended precision and all variable spills should be in extended precision. The current 68k and x86 ABI inconsistently truncate the precision from extended precision to double precision which can lead to non-repeatable results, catastrophic failures and crashes. The problem is the default ABI only and it could be fixed with a new ABI for a 64 bit 68k ISA for example. Extended precision is nice for compatibility as well as science and engineering but it was handicapped on the 68k and x86 with the default ABI. With a few minor additions including a new rounding mode and a fused FMA instruction, full quad precision could be supported using extended-double arithmetic as it uses the same sized exponent as extended precision and is not possible with the reduced exponent bits of double-double arithmetic.

https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic

Implementing quad precision in hardware is still expensive while extended precision was pushing the limits of the hardware when it was first implemented for the 68k and x86 FPUs but is relatively cheap now. The same does not seem to apply for FPGA or software emulation though. I would think WinUAE could provide extended precision without too much performance loss in the x86 FPU or using the x86-64 supported 64 bit integer datatype for the 64 bit extended precision fraction but maybe the guard bit, rounding bit and sticky bit create problems.

Last edited by matthey on 14-Aug-2022 at 03:14 AM.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 5:37:37
#174 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@matthey

Quote:

matthey wrote:

Up to the 68030/80386, 68k development was competitive and strait forward with incremental improvements. The 386 removed enough of the 808x limitations and this is when the IBM clone market accelerated. Motorola wanted to remain competitive but revenue and desktop margins were higher for x86 CPUs while the 68k was losing their best margin workstation market (the 68k embedded market was growing but lower margin).

I think that this was a major Motorola mistake: they lost a very profitable market in the servers / workstation are when Intel released its 386 with the features that I've listed on the other thread, which were very useful. 68030 partially solved it, but then it was too late.
Quote:
Motorola tried to move ahead of the 80486 with the 68040. The 68040 added one stage to the pipeline compared to the 80486 which improved performance (it had better performance/MHz) but this uses more transistors. The deeper pipeline should have allowed the 68040 to be clocked higher but more active transistors produce more heat which was a problem using the fab processes back then.

AFAIR Motorola used an internal clock with double frequency (like Intel with its 80486 DX2), and this was a major cause for the head.
Quote:
1991 microprocessor market share by volume
1) Zilog 20%
2) Intel 18.6%
3) Motorola 14.4%

1991 microprocessor market share by revenue
1) Intel 64.3%
2) Motorola 9.3%
3) AMD 8.4%

I think that those numbers clearly say that Motorola (and Zilog) bet on the wrong horse: the embedded market wasn't so profitable. They (at least Motorola, which had this possibility) should have better focuses on the desktop one (and servers / workstation), which guaranteed much bigger margins.

In this light, the continuous removal of features on its processor to better fit into the embedded was the wrongest decision for Motorola.

Desktop/server/workstation market required a very stable platform because backward-compatibility is the most important factor.
Quote:
CHK2 and CMP2 are rare but may be used by specialized 68020+ programs.

By Pascal (and similar languages) compilers, where array index checking is active by default.
Quote:
The Amiga Hardware Reference Manual says not to use CAS and CAS2 although they may be reliable in fast memory.

They are a must on multiprocessor systems = servers & workstation domain.
Quote:
There were a few very rarely used FPU instructions removed from the 68040 FPU for the 68060 but the addition of the very common FINT/FINTRZ is overall a big improvement and what the 68040 should have used as far as FPU instructions. The return of FINT/FINTRZ despite all the castrating shows just how bad of a mistake removing this instruction was on the 68040. Castrating instructions from the ISA certainly looked bad for desktop use especially when they were poorly chosen like FINT/FINTRZ and the 64 bit MUL and DIV instructions but I would have argued for keeping them even for embedded use.

Absolutely!!!
Quote:
Vbcc generated FPU code outperformed GCC and SAS/C code once those trapped FPU instructions were removed and nearly matched Pentium performance at the same clock in some benchmarks. No compiler I am aware of performs instruction scheduling for 68060 code which is important for in-order performance and makes the 68060 outperforming the Pentium that much more impressive. I believe the 68060 was the Pentium killer that Motorola needed to regain desktop and laptop market share while being a great embedded CPU but it was demoted to a high end embedded processor only and hidden away in the Motorola basement.

68060 ByteMark benchmarks
https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=44391&forum=25#847418

I don't trust those synthetic benchmarks. I would like to have the SPEC Int and FP rates, which are way much more reliable to check and compare processors performances.
Quote:
Someone asked what the Precision of the Apollo Core FPU is and received the following answer from Gunnar.

Gunnar von Boehn Quote:

R5 release supports 56 Bit FPU precision.
The upcoming R6 release support 64 bit.


I'm guessing that is not fraction bits as 64 bits of fraction is full extended precision. So this is probably 64 total bits with 15 bits kept for extended precision exponent?

I don't think so. In literature and in this specific context precision = mantissa.

Only when you talk about FP numbers, in general, precision = full sizeof(FP datatype).

But it would be good to have a clarification about it.
Quote:
The WinUAE compromise isn't too bad because the default 68k AT&T SysV (Unix) ABI doesn't retain more than double precision when passing function arguments on the stack (the same is true for the x86 ABI and FPU). Too fully take advantage of an extended precision FPU while providing good performance, an optimum ABI would pass floating point arguments in registers with any that don't fit placed on the stack in extended precision and all variable spills should be in extended precision.

I wonder why WinUAE doesn't always use the x87 extended precision: it should be able to emulate 68K's extended precision, at very good speed.

Maybe there's something which I miss (I never seen its sources: it's GPL stuff).
Quote:
With a few minor additions including a new rounding mode and a fused FMA instruction, full quad precision could be supported using extended-double arithmetic as it uses the same sized exponent as extended precision and is not possible with the reduced exponent bits of double-double arithmetic.

https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic

Implementing quad precision in hardware is still expensive

Indeed. It'll come, for sure, but it's still not for our mass markets.
Quote:
I would think WinUAE could provide extended precision without too much performance loss in the x86 FPU or using the x86-64 supported 64 bit integer datatype for the 64 bit extended precision fraction but maybe the guard bit, rounding bit and sticky bit create problems.

See above: it should use the x87 with its extended precision, to solve this problem. In theory...

 Status: Offline
Profile     Report this post  
Karlos 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 11:48:38
#175 ]
Elite Member
Joined: 24-Aug-2003
Posts: 3144
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@cdimauro

I always thought the internal double clock of the 040 was a bit of a misnomer given that it requires the external crystal to run at that speed. It probably sounds better than saying "half clocked bus" :)

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Hypex 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 12:19:42
#176 ]
Elite Member
Joined: 6-May-2007
Posts: 10830
From: Greensborough, Australia

@cdimauro

Quote:
It's not like 3D, but yes: in this specific case you've to unpack the source pixels to reach the size of the destination. Difficult for a CPU (but easier for one having a SIMD unit), but much easier in hardware.


I didn't elaborate on 3d but yes you can see what I meant. My point I intended about 3d was that the 3d hardware is designed to scale pixels in hardware. Usually where the pixels would be same width but it still needs take a set of data and expand it or shrink it to fit to a different sized set.

Quote:
No. If you the source has a different size compared to the destination (for example: 3 bits source, 5 bits destination), then you've to blit anyway the missing bitplanes, otherwise you mess-up the graphic.


In that kind of case it wouldn't be set up to be optimal.

Quote:
Unless your framebuffer + CLUT is organized in a way to implement some transparency effect using one or two (maximum: going over this makes less sense, because you're wasting the color palette only for the transparency effects) bitplanes.


That would be a better case and it seems wasteful but yes that kind of operation where you can use less planes but show more colour.

Quote:
That's because you haven't written videogames on Amiga.


I haven't no. The closest would be an incomplete game engine scroller using OS routines to do the work. But I have spent time in the past hitting the hardware directly with ASM experiments I played with.

Quote:
We're talking about a system similar to the (original) Amiga but with packed graphic instead.


Yes and taking it to the max! Around the same time other computers had large palettes. Such as Atari 400/800 that had 256 max. A C16 could display 121 colours without special tricks in 1984, even though the palette was organised differently, so an Amiga being limited to 32 colours without tricks in 1985 looks more limited.

Quote:
When then 8-bit pixel sizes age came then we had more hardware resources (CPU, memory) and it was more evident that packed was the way to go.


With the move to 3d also it became apparent that direct pixel access in one byte was suitable.

Quote:
Me neither, but I've some idea. As I've already said, it's implementing the masking which is more complex with packed graphics: the rest is simpler.


The masking yes would need some work in the operation.

 Status: Offline
Profile     Report this post  
Hypex 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 14:11:57
#177 ]
Elite Member
Joined: 6-May-2007
Posts: 10830
From: Greensborough, Australia

@matthey

Quote:
The TI TMS34010 chip had more advanced graphics capabilities than the Amiga custom chips when it came out. It was higher end and cost significantly more. It could have been used on a Zorro card like the A2410 graphics card with RTG to provide the Amiga with a high end graphics solution. The Zorro II bus bottlenecked the memory bandwidth and even the bugs in Zorro III would have limited the performance. CBM could have licensed the technology from TI and integrated it eliminating the cost of a high performance bus as TI was trying to license the technology to be used in the console market. CBM's vision was to reduce the cost of the Amiga into a C64 and they didn't seem interested in producing a high performance Amiga. The high end Amiga hardware market was left for 3rd party companies that could not achieve the economies of scale to be competitive.


Suppose they needed what they planned for later on. A modular design. But, they would have needed to break and expand the range. The Amiga would have needed to expand into a brand line instead of being one hardware model. However, Amiga users are quite particular, and don't like incompatibility. On the one hand they are happy to add busboards and PC cards to update the sound and video while on the other hand say it's not an Amiga without the chipset.

Quote:
The Amiga 500 was cheaper to manufacture than the C128. It was the C128, CDTV and Amiga 600 that didn't make sense. These "mistakes" were too expensive and divided the market between CBM products instead of fewer products with more margin increasing economies of scale.


The CDTV was cool, but it was misunderstood, and looked like a CD player.

Quote:
As I recall from ThoR, AmigaOS 4 uses compositing/overlay support to allow screen dragging but this required too much CPU performance for AmigaOS 3.


The point of compositing is to use the hardware which is how it should be done. Without the copper using a blitter is the next best thing. Still, I've seen OS3 do some form of screen dragging. Must have been when I was using P96. In any case the RTG hardware had all sort of blitter features so if they aren't being put to use it's the typical case of the Amiga not having a full hardware driver.

Quote:
Dedicated hardware, like a blitter or DSP, allows to use a cheaper CPU and lower the hardware cost. However, upgrading the CPU performance and capabilities gives more performance all the time for general purpose use and not just when blitting or doing DSP workloads. Using a thread for blitting uses CPU performance that would normally be wasted during stalls.


I like the efficiency involved. Some things like audio moved in the other direction, to simple DACs and using software to mix. But usually the CPU is busy with other tasks while offloading a blit.

Quote:
I believe the blitter per plane was a rumor (from Dave Haynie comment?) and likely never seriously considered. The big obstacle is a memory access per parallel blitter. The memory could be banked (or have separate memory controllers) but for each access to fall within a memory range corresponding to that bank requires rigid resolutions as some PC graphics hardware used. Another option would be to divide chip memory into different banks with separate memory pools and allocate each bitplane of a bitmap in a different bank but this would have created more fragmentation and reduced the amount of available chip memory. Any option like this would have likely been prohibitively expensive as well. The blitter ALU work time is short for a simple blitter that is clocked high so I believe a pipelined blitter with pipelined memory accesses makes more sense. Hypex Quote:


I'm not sure how parallel would have worked with the memory accesses involved. But, the planes all needed reading as well. I'm not sure if they were read in parallel or in serial. In any case, bitmap data of some size needed to be moved from one spot with some arrangement of planes, to another which may have another arrangement of planes. At the end of the day the programmer just needed it to do all planes regardless of how it was done. It just needed to be automated and work with more planes.

Quote:
I couldn't find the cost of a 68020 in 1985 but one source said it would cost $150 for the CPU while hardware expense for a full 32 bit CPU would increase much more than this. It still could have been worthwhile considering how many times the performance would increase and the much better shift and new bitfield instructions which could have replaced the blitter. Software blitting is often used on 68020+ Amigas so I would assume the performance is adequate on a 68020.


For small operations obviously using a CPU is better. Even better with bitfield operations. Of course the blitter was made for larger copies where needing to loop through every line came for free.

Quote:
The 68060 can usually do a mask and shift in fewer cycles than using a bitfield instruction so code will usually be faster when not using them. The bitfield instructions do improve code density and the 68060 is sensitive to large code so it is beneficial to use them when not in performance optimized loops but compilers are often not smart and the best performance for 68060 optimized code is to turn all bitfields off. The 68040 is the opposite as bitfields instructions are the fewest number of cycles relative to shift so they should always be used in 68040 optimized code. The 68020 and 68030 are between the 68040 and 68060 and it is usually better to use them for better code density though the performance difference isn't going to be much.


I should probably test more. I used FSUAE as a guide then tested on the real thing. The real thing was much slower! I really need a profile to match the real thing but such a feature would complicate the emulation as it would need to throttle the JIT. Beyond that it gets exhausting modifying routines and recompiling and testing.

Quote:
As part of the Apollo team, we looked closely at bitfield instructions. With Virtual GP then, Dungeon Master uses bitfield instructions often but many games don't use them at all. Some compilers generate bitfield instructions very often while others not at all. The surprising amount of compiler generated bitfield instructions shows how well compilers can use them and how general purpose they are which is important for deciding whether they are worthwhile. They improve code density modestly as well. The final factor of whether they should be included is whether they can be optimized to few enough cycles and indeed they can. Perhaps the 68060 was trying to do away with them by not optimizing them. The mask and shift method would sometimes be as fast but the bitfield instructions usually give better code density. Gunnar wanted to trap them for the Apollo core but Meynaf and I managed to convince him to include them and he even optimized them for an Apollo core advantage. That may have been the only debate Meynaf and I won against Gunnar where Gunnar actually seemed to change his mind.


Amazing, he took aboard advice and worked with a team. Where could it go wrong? The bitfields use quirky ASM with curly braces. This is on top of scale mode where the asterisk looks quirky to me. Suppose 68020 ASM speak just looks quirky to me. Paragraphs and commas are what I like to stick to.

Quote:
RISC philosophy was to eliminate hardware support for misaligned memory accesses in the CPU and most early RISC architectures didn't support it (PPC was one of the first RISC architectures to optionally but often support them in big endian mode). Today, most RISC architectures have adopted this CISC like feature like so many other useful CISC associated features. Yes, even a blitter could handle misaligned memory accesses and I wouldn't be surprising if it was actually worthwhile despite taking a few more transistors.


Suppose, against combining word bits into pixel indexes, that some masking is needed regardless. Though, pixel bits would need combining, as it shifts along each beat. Given it can be done in set word amounts it could possibly use a preset mask table. As well as shift, mask and combine.

 Status: Offline
Profile     Report this post  
cdimauro 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 21:11:27
#178 ]
Elite Member
Joined: 29-Oct-2012
Posts: 3084
From: Germany

@Karlos
Quote:

Karlos wrote:
@cdimauro

I always thought the internal double clock of the 040 was a bit of a misnomer given that it requires the external crystal to run at that speed. It probably sounds better than saying "half clocked bus" :)

To my defence I only recall that 040 used double clock (e.g.: 50Mhz for 040 sold as 25Mhz) internally for computations.


@Hypex
Quote:

Hypex wrote:
@cdimauro

Quote:
That's because you haven't written videogames on Amiga.


I haven't no. The closest would be an incomplete game engine scroller using OS routines to do the work. But I have spent time in the past hitting the hardware directly with ASM experiments I played with.

IMO it's not enough to understand the challenges of developing a videogame for an Amiga. Especially to have a concrete idea of the bottlenecks of the system and the parts which are more critical / compute-intensive.

Just my idea, in any case.
Quote:
Quote:
We're talking about a system similar to the (original) Amiga but with packed graphic instead.


Yes and taking it to the max! Around the same time other computers had large palettes. Such as Atari 400/800 that had 256 max. A C16 could display 121 colours without special tricks in 1984, even though the palette was organised differently, so an Amiga being limited to 32 colours without tricks in 1985 looks more limited.

Indeed. EHB was good, but it had to be expanded, like what Archimedes did.

@Hypex
Quote:

Hypex wrote:
@matthey

Quote:
I believe the blitter per plane was a rumor (from Dave Haynie comment?) and likely never seriously considered. The big obstacle is a memory access per parallel blitter. The memory could be banked (or have separate memory controllers) but for each access to fall within a memory range corresponding to that bank requires rigid resolutions as some PC graphics hardware used. Another option would be to divide chip memory into different banks with separate memory pools and allocate each bitplane of a bitmap in a different bank but this would have created more fragmentation and reduced the amount of available chip memory. Any option like this would have likely been prohibitively expensive as well. The blitter ALU work time is short for a simple blitter that is clocked high so I believe a pipelined blitter with pipelined memory accesses makes more sense. Hypex Quote:


I'm not sure how parallel would have worked with the memory accesses involved. But, the planes all needed reading as well. I'm not sure if they were read in parallel or in serial. In any case, bitmap data of some size needed to be moved from one spot with some arrangement of planes, to another which may have another arrangement of planes. At the end of the day the programmer just needed it to do all planes regardless of how it was done. It just needed to be automated and work with more planes.

Work was already hard serializing the blits with the single Blitter that the Amiga had: I can't imagine the mess of handling several of them in parallel...

 Status: Offline
Profile     Report this post  
matthey 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 22:31:55
#179 ]
Super Member
Joined: 14-Mar-2007
Posts: 1684
From: Kansas

cdimauro Quote:

I think that this was a major Motorola mistake: they lost a very profitable market in the servers / workstation are when Intel released its 386 with the features that I've listed on the other thread, which were very useful. 68030 partially solved it, but then it was too late.


You are jumping to conclusions when the reality was nuanced. Some of the workstation market had created their own MMUs and wanted to continue using them which allowed for compatibility and customization. Cost was not as important for workstations as for the desktop or embedded markets. Getting the simpler 68020 without MMU out sooner was generally a good thing for the server market which wanted 32 bit more than they needed a simple standard MMU integrated. The embedded market mostly didn't need a MMU at that time either. The largest benefit of an integrated MMU was for the desktop market where a MMU was often desirable but the lower cost of a single chip was important. Motorola was slow to get the 68851 external MMU chip out so even CBM started to create their own MMU for the 68020 according to Dave Haynie (before CBM decided eliminating the MMU was a cost reduction?).

https://groups.google.com/g/comp.sys.mac.hardware/c/CStPfTPWGOU/m/dNJYr8leh0oJ

The 68851 was not ready when the 68020 was introduced which likely means it couldn't have been integrated then and including it would have caused a delay of the 68020. It is certainly worthwhile to integrate the MMU as it is very cheap, the performance much improved and the overall cost lower when an MMU is desired but there were sometimes compromises back then to fit it like a simpler MMU with reduced TLB entries compared to an external MMU chip. The Motorola mistake was taking so long to get the 68030 with MMU out the door which didn't launch until 1987 some 3 years after the 68020. I would have canceled the 68851 and prioritized the 68030. Still, Motorola did not lose its desktop customers of CBM, Apple and Atari. The workstation market was being lost due to RISC hype, ease of design and ease of clocking them up (power and heat are not as important for the workstation market) but the divided RISC market was poor for economies of scale allowing x86 to ride in later on the back of an upgraded desktop PC giant.

cdimauro Quote:

I think that those numbers clearly say that Motorola (and Zilog) bet on the wrong horse: the embedded market wasn't so profitable. They (at least Motorola, which had this possibility) should have better focuses on the desktop one (and servers / workstation), which guaranteed much bigger margins.


Did ARM bet on the wrong "embedded market" horse? ARM has leveraged embedded market economies of scale to make an assault on the desktop and server markets, albeit with a more CISC like "RISC" and mostly falling short so far. The embedded market is also consistent (defensive) and growing where the desktop market is cyclical, smaller and less important than the combined mobile and embedded markets. Yes, x86-64 has the growing and very profitable server market too but has had trouble scaling down to ARM embedded territory. Motorola actually had a good proportion of the markets from embedded to desktops to workstations which neither x86(-64) or ARM has been able to accomplish. Motorola certainly could have focused more on the middle desktop market but it was the smallest of the 3.

cdimauro Quote:

In this light, the continuous removal of features on its processor to better fit into the embedded was the wrongest decision for Motorola.

Desktop/server/workstation market required a very stable platform because backward-compatibility is the most important factor.


The desktop may benefit the most from a stable ISA. Servers/workstations and embedded hardware often use Linux/BSD where the OS is simply recompiled for whatever hardware is used. It's the personal computers, including mobile, where people would like to download and buy general purpose software without recompiling it and is better optimized for the hardware due to standardization.

cdimauro Quote:

I don't trust those synthetic benchmarks. I would like to have the SPEC Int and FP rates, which are way much more reliable to check and compare processors performances.


BYTEmark/NBench is no more synthetic than SPECint. They both are a test suite of multiple real world algorithms that give a combined score. I believe the SPECint choice of real world algorithms is better but an older version would likely need to be used and it is not free.

cdimauro Quote:

I don't think so. In literature and in this specific context precision = mantissa.

Only when you talk about FP numbers, in general, precision = full sizeof(FP datatype).

But it would be good to have a clarification about it.


I believe there are 3 possibilities as described but I excluded the first because the FPU would support full extended precision if the fraction/mantissa=64.

1) sign=1, exponent=15, fraction/mantissa=64 (full extended precision)
2) sign=1, exponent=15, fraction/mantissa=48 (native 64 bit FPU format)
3) sign=1, exponent=11, fraction/mantissa=52 (full double precision datatype in memory)

Since the 3rd option is full double precision and the same as WinUAE default, I lean toward that being correct which is sizeof(double)*8=64 bits of precision. The exponent is likely kept at 15 bits internally for compatibility but he doesn't count that when talking about double precision. The fraction/mantissa is what is difficult to handle in FPGA and not the narrower exponent. Full extended precision requires a 67 bit ALU and barrel shifter for common normalizing. The 68881 FPU could shift any number of places in one cycle which modern FPGAs may have trouble doing.

cdimauro Quote:

Indeed. It'll come, for sure, but it's still not for our mass markets.


Since quad precision floating point is not here now for PCs, I'm not so sure full quad precision hardware ALUs will come. Extended-double arithmetic would be the next best solution to provide full quad precision support with good hardware acceleration and without the cost of a wider ALU than is required for extended precision.

Last edited by matthey on 14-Aug-2022 at 10:40 PM.

 Status: Offline
Profile     Report this post  
MEGA_RJ_MICAL 
Re: Packed Versus Planar: FIGHT
Posted on 14-Aug-2022 22:44:07
#180 ]
Super Member
Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

PADDING

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle