Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
15 crawler(s) on-line.
 120 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 Gunnar:  6 mins ago
 retrofaza:  18 mins ago
 kolla:  21 mins ago
 saimo:  28 mins ago
 A1200:  30 mins ago
 MEGA_RJ_MICAL:  35 mins ago
 NutsAboutAmiga:  58 mins ago
 zipper:  59 mins ago
 clint:  1 hr 35 mins ago
 RickSkid:  1 hr 44 mins ago

/  Forum Index
   /  General Technology (No Console Threads)
      /  Apple moving to arm, the end of x86
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 Next Page )
PosterThread
MEGA_RJ_MICAL 
Re: Apple moving to arm, the end of x86
Posted on 14-Jul-2020 19:52:37
#121 ]
Super Member
Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

@evilFrog

Quote:

evilFrog wrote:
@MEGA_RJ_MICAL

Disappointed you didn’t post the circuit diagram for ZORRAM.


What is that?

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

 Status: Offline
Profile     Report this post  
Hammer 
Re: Apple moving to arm, the end of x86
Posted on 15-Jul-2020 3:07:04
#122 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5246
From: Australia

@megol
Quote:

But not the only one and one feature that isn't really a defining characteristic anymore. ARM, MIPS, RISC V, are all variable length. AARCH aka ARM64 isn't though.

Not all modern x86 use fixed length internal instructions BTW.

The so-called RISC CPUs with limited variable-length instructions are not pure old school RISC.

Fixed-length instructions make it easier for superscalar pipelining.

Quote:

Yes and? The instructions are still stored as macro instructions that are split into one or more when the right time comes, they are treated as one instruction for retire purposes and some other complications. The reason separate agu and ld/st units are to increase instruction throughput in an out of order execution design, simple as that.

When breaking down complex instructions, it's for atomic operation.

AMD K5 recycles AMD 29K RISC CPU core.

https://en.wikichip.org/wiki/micro-operation
Intel refers to the internal operations of fixed length, regular format, and encoding a micro-operation. Those are a result of decoded macro-operations.

AMD refers to the simple, single-operation (e.g. a single arithmetic or memory operation) a micro-operation. Those µOPs makes up a potentially more complex macro-operation.

ARM refers to the internal representation of instructions as micro-operation. Those are a result of decoded instructions or may be part of a group of µOPs as macro-operation.


Quote:

Now realize that a high-performance x86 will require complicated tracking per instruction to support very rare special cases where an instruction have to be re-executed in a weird way to be compatible.

GPUs have larger combo instructions such as
1. gather and scatter or instructions
2. instructions and data payload for BVH and intersection test hardware.

RDNA wave32 and wave64 contain multiple instructions within a payload wavefront issue.
GCN only supports wave64.

Quote:

Or having to track some instruction chunks as one instruction. Micro-exceptions to handle special cases for common instructions (don't know if current Intel and AMD designs do that anymore).

Newer X86 compilers are optimized to minimize multi-clock cycle instructions.



Quote:

Microcode sequence treated as a single instruction while it have a complex sequence of operations that have to be flushed if misspeculated, extra tracking in timing sensitive paths to handle this and more common but still complex flushing.

X86 aren't RISC.

Modern X86 is not an old school CISC.

On your mis-speculated argument, offer a faster RISC CPU alternative over Ryzen 9 3900X in Blender 3D 2.83 running BMW27 demo benchmark.


Quote:

Let's say they do generate optimal code (they don't), what would the significance be?

Newer C++ X86 compilers generate optimal code e.g. X87 is not being generated under AMD64.

Quote:

ever said it can be. But if you think Intels 14nm++++++++++++++ process is better than TSMCs 7nm++ currently in production well...
Note that Intel haven't used their newer 10nm-the-return-and-actually-working-kind-of process for high performance designs which illustrates your point that a simple nm marketing isn't worth much.

Intel has 10 nm in higher margin laptops e.g. Ice Lake. LOL

Intel 14nm++++++++++++++ can easily reach ALL CORES >5Ghz while TSMC 7nm Zen 2 is around ALL CORES 4.4 Ghz.

Intel has supply issues and can't make the full 10 nm process tech transition.

Last edited by Hammer on 15-Jul-2020 at 03:23 AM.
Last edited by Hammer on 15-Jul-2020 at 03:21 AM.
Last edited by Hammer on 15-Jul-2020 at 03:19 AM.
Last edited by Hammer on 15-Jul-2020 at 03:08 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
MEGA_RJ_MICAL 
Re: Apple moving to arm, the end of x86
Posted on 15-Jul-2020 6:06:34
#123 ]
Super Member
Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

@Hammer

Quote:

Fixed-length instructions make it easier for superscalar pipelining.


That's what my granny always used to say!
Finally, It was about time someone pointed this out.

We can now shut down the forums once and for all,
confident that we have done all we could for the Amiga,
boats against the current, borne back ceaselessly into the past.

MRJM
THE END!

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

 Status: Offline
Profile     Report this post  
Hypex 
Re: Apple moving to arm, the end of x86
Posted on 15-Jul-2020 14:49:44
#124 ]
Elite Member
Joined: 6-May-2007
Posts: 11180
From: Greensborough, Australia

@matthey

Quote:
Prefixes are not necessary. Gunnar does not use prefixes although it offers minimal 64 bit support other than SIMD instructions, especially for 64 bit addressing (not currently used by AmigaOS but AROS could be testing 64 bit 68k addressing). I prefer a 64 bit mode which is partially re-encoded and does not need a prefix.


For 68K prefixes would be inappropriate. The ISA isn't designed that way. x86 prefixes are read by the byte in little endian order so suitable for the ISA. But for 68K there is a restriction to 16-bit for both instructions and data. The format is instruction code and then parameters So for 68K, it would need to be a postfix, so to speak.

Gunnar has also extended the copper. But I think this is inappropriate even if convenient. The copper is like PPC and extending it breaks code structure. Each copper instruction is 32-bits long. He wants to add a 32-bit write, but that will push the copper list codes out of long word alignment. A 64-bit long write code would be more suitable with 32-bit data inside, but would also take up the same space as two 16-bit writes. The copper is not a 68K.

I'm not sure about a 64 bit 68K mode. Would it be a SR flag and recode certain instructions to be different? The way I see it, 64-bit codes should fit in as well as any other. With ability to chose byte, word, long and double long. Or quad. Guess long long long is used in C. So, move.b, move.w move,l and move.ll (or move.dl or move.q) should all naturally mix together. I don't know if there is space in the size bitfield.

 Status: Offline
Profile     Report this post  
matthey 
Re: Apple moving to arm, the end of x86
Posted on 15-Jul-2020 20:19:44
#125 ]
Super Member
Joined: 14-Mar-2007
Posts: 1968
From: Kansas

Quote:

Hypex wrote:
For 68K prefixes would be inappropriate. The ISA isn't designed that way. x86 prefixes are read by the byte in little endian order so suitable for the ISA. But for 68K there is a restriction to 16-bit for both instructions and data. The format is instruction code and then parameters So for 68K, it would need to be a postfix, so to speak.


x86 didn't have as many choices due to lack of encoding space. It was either replace instructions (they did some of this too), add pre-fixes or re-encode the instructions (not binary compatible but probably the best choice for the mess). The 68k can split open encodings without prefixes or postfixes although this also makes instructions longer.

Quote:

I'm not sure about a 64 bit 68K mode. Would it be a SR flag and recode certain instructions to be different?


Yes

Quote:

The way I see it, 64-bit codes should fit in as well as any other. With ability to chose byte, word, long and double long. Or quad. Guess long long long is used in C. So, move.b, move.w move,l and move.ll (or move.dl or move.q) should all naturally mix together. I don't know if there is space in the size bitfield.


That is my logic too. Most instructions should support byte (8 bit), word (16 bit), long (32 bit) or quad (64 bit) using the same instruction format. The 68k size bitfield is 2 bits so 2^2=4 sizes which works out perfectly. Unfortunately, some of the encodings which could be used for the 64 bit size were reused for other instructions. Many of these instructions are newer 68020 instructions which can be eliminated and the others re-encoded. A 64 bit mode allows to clean up the mess.

The x86_64 often does *not* support 64 bit immediates requiring a MOV+OP. Most 32 bit operations have their results sign extended to 64 bits which is different behavior than for 16 bit or 8 bit operations although it works out well. Writing the whole register makes result forwarding easier but can use more power processing those upper bits. The 68k always sign extends address register results to the address register size giving them a performance advantage over data registers. Data registers are more flexible and I would rather keep the handling of different data sizes consistent, unlike x86_64.

 Status: Offline
Profile     Report this post  
megol 
Re: Apple moving to arm, the end of x86
Posted on 17-Jul-2020 23:10:37
#126 ]
Regular Member
Joined: 17-Mar-2008
Posts: 355
From: Unknown

@matthey
I wanted to write a proper response without available time and then forgot about it (rarely visit here so nothing to remind me). Sorry.

matthey wrote:
Quote:

Prefixes are not necessary. Gunnar does not use prefixes although it offers minimal 64 bit support other than SIMD instructions, especially for 64 bit addressing (not currently used by AmigaOS but AROS could be testing 64 bit 68k addressing). I prefer a 64 bit mode which is partially re-encoded and does not need a prefix. A-line can provide MOVE.Q and re-encoding can provide OP.Q instructions which simplifies decoding (Gunnar wanted to make this encoding simplification but it causes incompatibility if not in a separate mode). The 32 bit 68k mode could be dropped for implementations which do not need compatibility. I believe better performance, security and 64 bit code density can be provided with a separate 64 bit mode. It should be possible to allow 32 bit mode processes for compatibility like ARM modes.

Not using a prefix would result in either a subset of instructions supported, every instruction having an alternative encoding (in most cases still requiring at least 32 bits for the opcode), or a new mode. Prefixes allows 64 bit code to be mostly compatible even though it's not optimal.
I think we've gone through this a few times before ;)

Quote:

The Apollo Core has 64 bit registers and limited 64 bit operations. The justification is probably that the SIMD instruction performance increases offset some of the slow down and transistor costs of 64 bit in the limited FPGA space. This shows that the cost of 64 bit is not much even though a SIMD unit with 64 bit registers and no floating point support likely has limited appeal.

Depends on what an aggressive 32 bit only design would provide. The Apollo design is mostly opaque so hard to tell what the overheads are.

Quote:

General purpose performance came from pipelining, caches, superscalarity, OoO, super-pipelining, clock increases and SMT/SMP. Clock increases have been important but have occurred gradually over timer mostly made possible by die shrinks. If power management is important, OoO, super-pipelining and clock increases may not be worthwhile.

Still if we look at history the main driver have always been clock frequency. Out of order execution and speculative execution both have the advantage of allowing increased pipeline lengths and thus clock frequency.
At the time of the Alpha clock still ruled.

Quote:

Alpha architects were no doubt some of the best at that time and likely made greater contributions to technology than other teams. Their extreme designs were not always the best. These designs often had bottlenecks, were difficult to program and the ISA was primitive with one of the worst code densities of any RISC ISA ever.

It was designed for high level languages. Code density is only relevant (for the intended market) if it limits performance and it didn't.

Quote:

Die shrinks were the name of the game at that time and were partially responsible for the short lived performance holders. The 604 was performance king between the Alpha 21064 and 21064A which was also on .5 um process but doubled the I and D caches. Even benchmark code may have been falling out of the ICache on the 21064 slowing it to memory speed. The large caches of the 604 was much better for multitasking and server use while the Alpha 21064 caches were more appropriate for embedded applications with small reused code.

Looking for some data I find that the PPC 604 was introduced in the PowerMac line in may of 1995 at 120MHz, at the same time the 21164 was available at 300Mhz.

Quote:

PPC would have had a highly clocked contender if the Exponential X704 had made it to market (533MHz target around 1997).

Exponential PPC X704
L1: 2kiB ICache direct mapped, 2kiB DCache direct mapped
L2: 32kiB 8 way
L3: 512kiB-2MiB optional external direct mapped

The L1 is tiny but completely eliminated the load-use penalty. There may have been room for more clocking up with these small caches although problems kept them from even achieving their initial target clock rating. Exponential was a startup that Apple strung along before cancelling their contract for breach of contract due to lower than estimated clock ratings.

I've always had some weird fascination with that and other fringe experiments. Note however that the 21164A shipped in 1997 at 600MHz using a standard CMOS process.

Quote:

Alpha showed the world how much heat is produced when clocking up which was more than many people thought. Unfortunatly for DEC, the power of their cores kept them from entering the embedded market. Exponential also found itself without customers for its highly clocked chips. On the other hand, the startup P.A. Semi had embedded customers lined up for its low power PWRficient design and was taken out by Apple for this technology. Ironically, P.A. Semi was founded by Daniel Dobberpuhl (RIP 2019) the lead designer of the Alpha 21064.

DEC never really attempted to design a lower power processor, just lower powered versions still for workstations. The money driver in those days weren't appliances and smartphones so it was likely the best choice.

Quote:

An L2 cache is usually unified. If there are 2 levels of separated caches, they are usually called an L0+L1 with a unified L2 cache, not that the latter is common or the terminology standardized.

I don't agree but maybe I'm too conservative. The Itanium is one example of an architecture with two level instruction cache and AFAIK IBM have a number of designs with multiple levels of instruction caches, the Itanium as it required it with extremely bad instruction density and the IBM designs as they are intended to run database code efficiently (okay maybe that's one additional reason for Itanic too).

Quote:

I did mention performance. Keeping the DCache small and close is probably more important to performance, especially for an in order design.

Yep and that's one reason why it's harder to make a good in order design in todays processes, physics make deep pipelining almost impossible to avoid and in order will often stall if cache reads aren't very low latency.
Itaniums wasted a lot of power on making data cache reads fast even with their comparatively low clocked designs. Impressive technically but...

Quote:

You can make the argument that the Alpha cores were throughput cores but, as far as I know, that is not how they were usually used. Weren't they used as high end PCs and workstations?

Sorry I was unclear - didn't mean it as a throughput design just one having a high throughput through high performance. From high performance execution core to high clocked buses with a high throughput memory subsystem even when that meant more power consumption.

Quote:

The SonicBOOM RISC-V core has done a good job of adding performance enhancing features without adding the complexity in the ISA. RISC-V open cores should gain market share for low to mid performance cores. The compressed extension is being supported by Linux and many new cores now. It will be interesting to see what they standardize on for SIMD support. I just can't get excited about it.

Me neither. Important for computing in general but... it's just meh, a cleaned and rewarmed MIPS.

 Status: Offline
Profile     Report this post  
Hammer 
Re: Apple moving to arm, the end of x86
Posted on 29-Jul-2020 4:12:41
#127 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5246
From: Australia

@MEGA_RJ_MICAL

Quote:

MEGA_RJ_MICAL wrote:
@Hammer

Quote:

Fixed-length instructions make it easier for superscalar pipelining.


That's what my granny always used to say!
Finally, It was about time someone pointed this out.

We can now shut down the forums once and for all,
confident that we have done all we could for the Amiga,
boats against the current, borne back ceaselessly into the past.

MRJM
THE END!

FYI, I'm not against Vampire V4 for the retro market segment, but be realistic on practical performance instead of engaging in speculative performance arguments.

I'm using Blender 3D 2.83 which needs plenty of raytracing performance and I'm using accelerated RT hardware.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
paolone 
Re: Apple moving to arm, the end of x86
Posted on 30-Jul-2020 12:53:59
#128 ]
Super Member
Joined: 24-Sep-2007
Posts: 1143
From: Unknown

@Hammer, Hypex and others

I really like when this kind of discussions turns into a war of technical jargon, however I thibk you're completely missing the point here.


ANY modern processor - be it ARM or X64 - can be a good candidate for a desktop system, a laptop, a console. They all have the raw power to perform any kind of task a central processor is supposed to do. You need, however, a good GPU and a software framework to smartly schedule tasks to both of them, according to the type of task and its best fit.


Apple is now using its own processors because they think this will reduce their costs while keeping their products competitive. Only history will tell us, in the future, if they're right or wrong.


BTW: about retro market -- if there's a market for the Mega 65 (and there is), there can be a market even for Vampire accelerators.

Last edited by paolone on 30-Jul-2020 at 12:56 PM.

 Status: Offline
Profile     Report this post  
matthey 
Re: Apple moving to arm, the end of x86
Posted on 30-Jul-2020 20:16:13
#129 ]
Super Member
Joined: 14-Mar-2007
Posts: 1968
From: Kansas

Quote:

Hammer wrote:
FYI, I'm not against Vampire V4 for the retro market segment, but be realistic on practical performance instead of engaging in speculative performance arguments.


Are the arguments about the Apollo Core "speculative performance arguments"? The Apollo Core most resembles a 68060 which has some technical information available and can be mostly re-implemented.

Were you suggesting that the 68k won't have good performance because, "Fixed-length instructions make it easier for superscalar pipelining"?

1) The 68060 decodes and converts instructions to a fixed length RISC like encoding in the Instruction Fetch Pipeline before adding to an instruction buffer for the decoupled execution pipes. A single stage table lookup scheme is used for early decode of instructions. In the common case, decoding is easier on the 68k than x86 with the opcode and register/EA fields easier and quicker to find. Decoding problems for the 68k are long instructions and finding the instruction length with the 68020 ISA additions.

2) The 68060 only has a 4 byte/cycle instruction fetch yet can execute up to 3 instructions per cycle. All 32 bit fixed length instruction RISC processors can't be superscalar with this small of a fetch as this is only one instruction per cycle! Fetching smaller code left more bandwidth for data helping to allow a cheaper 32 bit data bus with similar performance to the 64 bit data buses of the competition. Instruction supply uses the most power on low end embedded processors at 42% according to a study. If more performance is desired, an 8 byte/cycle instruction fetch, 3rd integer pipe, dual ported data cache and 64 bit data bus should be potent and efficient.

As we can see, code compression is more of an advantage than a small latency increase in the pipeline. The transistor count for code compression is more than offset by the ICache and memory transistor savings with significant savings in power. The 68060 shows that decoding power was low even on a simple CPU back in the '90s where a complete system would have used less power than most similar performance RISC systems and where the decoding tax of x86 was still a high percentage of energy usage. Superscalar pipelining was no problem for the 68k and I will argue that the 68060 did it more efficiently than most of the competition based on PPA (Power, Performance, Area) data. This is *not* speculative!

Quote:

I'm using Blender 3D 2.83 which needs plenty of raytracing performance and I'm using accelerated RT hardware.


The Amiga is way behind on technology. It can't catch up quickly without spending lots of money. Simpler mass produced hardware with (embedded) partners is the easier and less risky way to get a foot in the door. It would be fitting if the Amiga offered 3D realtime RT with the Amiga RT roots. The Amiga would need a miracle to make it happen anytime soon though.

Last edited by matthey on 30-Jul-2020 at 08:21 PM.

 Status: Offline
Profile     Report this post  
kolla 
Re: Apple moving to arm, the end of x86
Posted on 30-Jul-2020 20:51:29
#130 ]
Elite Member
Joined: 20-Aug-2003
Posts: 2859
From: Trondheim, Norway

@matthey

What RT roots? As far as I know, the OS offer no timing guarantees.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
matthey 
Re: Apple moving to arm, the end of x86
Posted on 30-Jul-2020 21:54:12
#131 ]
Super Member
Joined: 14-Mar-2007
Posts: 1968
From: Kansas

Quote:

kolla wrote:
What RT roots? As far as I know, the OS offer no timing guarantees.


Hammer was talking about "Ray Tracing" not Real Time. The Amiga was one of the earliest home computers to display a good enough picture to see the results of ray tracing so it became popular until the 68k was left behind as it requires considerable processing power. Real time ray tracing uses different and faster algorithms and specialized hardware to replace rasterized 3D rendering. Performance is now good enough to play 3D games at lower frame rates than rasterized hardware with room to improve the technology. The big advantage is more realistic lighting and shadows.

 Status: Offline
Profile     Report this post  
Hammer 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 3:46:50
#132 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5246
From: Australia

@matthey

Quote:

matthey wrote:
Quote:

Hammer wrote:
FYI, I'm not against Vampire V4 for the retro market segment, but be realistic on practical performance instead of engaging in speculative performance arguments.


Are the arguments about the Apollo Core "speculative performance arguments"? The Apollo Core most resembles a 68060 which has some technical information available and can be mostly re-implemented.

Were you suggesting that the 68k won't have good performance because, "Fixed-length instructions make it easier for superscalar pipelining"?

1) The 68060 decodes and converts instructions to a fixed length RISC like encoding in the Instruction Fetch Pipeline before adding to an instruction buffer for the decoupled execution pipes. A single stage table lookup scheme is used for early decode of instructions. In the common case, decoding is easier on the 68k than x86 with the opcode and register/EA fields easier and quicker to find. Decoding problems for the 68k are long instructions and finding the instruction length with the 68020 ISA additions.

2) The 68060 only has a 4 byte/cycle instruction fetch yet can execute up to 3 instructions per cycle. All 32 bit fixed length instruction RISC processors can't be superscalar with this small of a fetch as this is only one instruction per cycle! Fetching smaller code left more bandwidth for data helping to allow a cheaper 32 bit data bus with similar performance to the 64 bit data buses of the competition. Instruction supply uses the most power on low end embedded processors at 42% according to a study. If more performance is desired, an 8 byte/cycle instruction fetch, 3rd integer pipe, dual ported data cache and 64 bit data bus should be potent and efficient.

As we can see, code compression is more of an advantage than a small latency increase in the pipeline. The transistor count for code compression is more than offset by the ICache and memory transistor savings with significant savings in power. The 68060 shows that decoding power was low even on a simple CPU back in the '90s where a complete system would have used less power than most similar performance RISC systems and where the decoding tax of x86 was still a high percentage of energy usage. Superscalar pipelining was no problem for the 68k and I will argue that the 68060 did it more efficiently than most of the competition based on PPA (Power, Performance, Area) data. This is *not* speculative!

Quote:

I'm using Blender 3D 2.83 which needs plenty of raytracing performance and I'm using accelerated RT hardware.


The Amiga is way behind on technology. It can't catch up quickly without spending lots of money. Simpler mass produced hardware with (embedded) partners is the easier and less risky way to get a foot in the door. It would be fitting if the Amiga offered 3D realtime RT with the Amiga RT roots. The Amiga would need a miracle to make it happen anytime soon though.

I'm already aware of 68060's fixed instruction length conversion process. Being easier to decode augment is nearly pointless when it was Motorola/Freescale that gave up on 68K family while Intel, AMD and VIA continued with x86 family.

68060 has a duel integer decoder with dual integer pipelines and a separate FPU pipeline.

The current flagship X86 CPU arch is AMD's Zen 2 and Intel's CoffeeLake/IceLake/Casecadelake.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
MEGA_RJ_MICAL 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 9:05:12
#133 ]
Super Member
Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

@matthey

Quote:

Hammer was talking about "Ray Tracing" not Real Time. The Amiga was one of the earliest home computers to display a good enough picture to see the results of ray tracing.

Performance is now good enough to play 3D games at lower frame rates than rasterized hardware with room to improve the technology. The big advantage is more realistic lighting and shadows.


Knowledgeable friend matthey,

could we hack Raytracing into Street Fighter II for our Amigaworld friend BigD?

I believe he is praying our Lord Jesus already for something to happen in that sense, and this would make for a nice addition: just think of the shadows Guile's hair would scatter around while being combed back at the end of a victorious fight!

It would also make Street Fighter II RT that long sought "killer app" that will project overnight our beloved amiga back into mainstream heaven.

Thanks,
MEGA RJM

Last edited by MEGA_RJ_MICAL on 01-Aug-2020 at 09:09 AM.

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

 Status: Offline
Profile     Report this post  
BigD 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 10:58:03
#134 ]
Elite Member
Joined: 11-Aug-2005
Posts: 7307
From: UK

@MEGA_RJ_MICAL

We've got a version of SFII that is pretty with good sound effects and one that is ugly but with good playability. Fightin' Spirits show us that something better is possible. That is all! I'm sorry if talking sense offends you

People associated with other platforms tend to just get on with making things better e.g. the recent home brew MegaDrive SFII Championship Edition Remastered, whereas some here tend to just moan and derail threads with their whimsy and sarcasm and to try and appear superior! I'm just saying, Violent Ken

Last edited by BigD on 01-Aug-2020 at 11:06 AM.
Last edited by BigD on 01-Aug-2020 at 11:05 AM.

_________________
"Art challenges technology. Technology inspires the art."
John Lasseter, Co-Founder of Pixar Animation Studios

 Status: Offline
Profile     Report this post  
MEGA_RJ_MICAL 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 13:32:50
#135 ]
Super Member
Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

@BigD

Quote:

BigD wrote:
some here tend to just moan and derail threads with their whimsy and sarcasm and to try and appear superior!


WHO ARE SUCH FIENDS!
LET'S BURY THEM UNDER A ROARING
"ZORRAM!!!!!"


mrjm

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

 Status: Offline
Profile     Report this post  
matthey 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 16:02:30
#136 ]
Super Member
Joined: 14-Mar-2007
Posts: 1968
From: Kansas

Quote:

Hammer wrote:
I'm already aware of 68060's fixed instruction length conversion process. Being easier to decode augment is nearly pointless when it was Motorola/Freescale that gave up on 68K family while Intel, AMD and VIA continued with x86 family.


1) When the 68000 came out, Intel was worried it would put them out of business.

Quote:

I can tell you at Intel it [the 68000] was pretty electrifying too... it was terrifying.


- David House of Intel in "Motorola 68000 Oral History Panel"

2) A few years later it was thought RISC processors would replace both the 68k and x86. The 68k was replaced by RISC while Intel stayed with the x86 as the PC clone market sustained them. The grass wasn't so green on the other side of the fence for Motorola/Freescale as they lost their embedded market dominance to ARM, nearly went bankrupt and were bought by NXP.

3) A few years later it was thought that VLIW processors would replace the x86 and Intel was hooked this time. Once again, the grass wasn't as green on the other side of the fence. It was a costly Itanic mistake of EPIC proportions that let AMD gain market share and become a major competitor (AMD went from ~0% of server market to nearly 25%).

4) x86 and its 64 bit variant x86_64 processors became the dominant high performance processors in the world.

We can see a roller coaster ride of politics and hype. The x86 often looked inferior enough to look for alternatives (1-3 above) yet came out on top for performance in the end. Is it possible the discarded 68k is better in many comparisons and often by a significant margin?

68k
easier to decode, fewer instructions to execute, better code density, less memory traffic, fewer branches, more GP registers, more powerful addressing modes

x86
shorter average instruction size, simpler instructions to execute (both of these advantages diminished with x86_64)

Could this have something to do with the 68060 outperforming the Pentium?

Pentium@75MHz 80502, 3.3V, 0.6um, 3.2 million transistors, 9.5W max
68060@75MHz 3.3V, 0.6um, 2.5 million transistors, ~5.5W max

Performance was similar despite the 68060 being a less aggressive design for performance (fewer transistors, smaller instruction fetch and narrower data bus). The 68060 dominated in overall PPA (Power, Performance and Area) which is important in evaluations of embedded processors today. If x86 used less power, it would be used in more embedded devices today and ARM would not be nipping at its heels.

Quote:

68060 has a duel integer decoder with dual integer pipelines and a separate FPU pipeline.


The FPU pipeline is not separate. The primary integer pipe executes the first 3 stages (Decode, EA Calc, EA Fetch) and then hands off to the FPU for a single FPU execute stage. The 68060 is in order 2 wide issue only but both integer pipes can continue to execute integer instructions while a multi-cycle FPU instruction executes. A predicted branch taken (loop) can be folded away which counts as another instruction executed (ColdFire and the Apollo Core expanded the use of code folding). Of course, each integer pipe can execute a more powerful instruction than RISC pipes (classic RISC can require a half dozen instructions with bubbles to do the same work and OoO can't fix this as AArch64 recognizes by adding more complex addressing modes it looks like it copied from the 68k).

Quote:

The current flagship X86 CPU arch is AMD's Zen 2 and Intel's CoffeeLake/IceLake/Casecadelake.


Development costs were likely larger than the GDP of several small countries too. It is possible to make pigs fly with enough money.

Last edited by matthey on 01-Aug-2020 at 06:44 PM.
Last edited by matthey on 01-Aug-2020 at 06:15 PM.
Last edited by matthey on 01-Aug-2020 at 04:08 PM.
Last edited by matthey on 01-Aug-2020 at 04:05 PM.

 Status: Offline
Profile     Report this post  
Hypex 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 16:57:40
#137 ]
Elite Member
Joined: 6-May-2007
Posts: 11180
From: Greensborough, Australia

@matthey

Quote:
x86 didn't have as many choices due to lack of encoding space. It was either replace instructions (they did some of this too), add pre-fixes or re-encode the instructions (not binary compatible but probably the best choice for the mess). The 68k can split open encodings without prefixes or postfixes although this also makes instructions longer.


I suppose, like those 80's movie cops that did it by the the book, Intel needed to do it by the byte. Plus other things like real mode and virtual. The 68K would need longer encodings, but it needs to be even, doing it by the word.

Quote:
That is my logic too. Most instructions should support byte (8 bit), word (16 bit), long (32 bit) or quad (64 bit) using the same instruction format. The 68k size bitfield is 2 bits so 2^2=4 sizes which works out perfectly. Unfortunately, some of the encodings which could be used for the 64 bit size were reused for other instructions. Many of these instructions are newer 68020 instructions which can be eliminated and the others re-encoded. A 64 bit mode allows to clean up the mess.


I would have expected them to use the size bitfield for other things as it's never that easy. I don't know if they planned that far ahead in the late 70's. The 68K did get some 64-bit support later on, with results divided into two registers. Not the most perfect solution, a sign it was busting at the seams. Of course, with a 64-bit mode that reuses some encodings, more transistor space is needed.

Quote:
The x86_64 often does *not* support 64 bit immediates requiring a MOV+OP. Most 32 bit operations have their results sign extended to 64 bits which is different behavior than for 16 bit or 8 bit operations although it works out well. Writing the whole register makes result forwarding easier but can use more power processing those upper bits. The 68k always sign extends address register results to the address register size giving them a performance advantage over data registers. Data registers are more flexible and I would rather keep the handling of different data sizes consistent, unlike x86_64.


That's a good point. The 68K has been criticised in places for not having GPRs. But moving a word into an address register does have the effect of clearing the upper word and being able to do tricks by moving around data and not have the CC affected. Thinking about it, the usual sizes should work as usual., since the .b, .w and .l instructions wouldn't know of anything larger. Even if the registers on file are 64-bit, they should only use the size in the instruction, to both be correct as well as use less cycles when it isn't needed.

Another thing is an increase of both address and data register banks, but this may be more problematic than 64-bit, since five bits are needed over four in encodings.

FPU is another subject. Haven't looked into x87 lately, but the 68882 and built in FPU was superior to 64-bit with 80-bit width in the 80's. Should it be trimmed down to be in alignment or kept? By itself there is 8 extra registers needed. Add vectors and then you need another 8, lilkely wider than FPU, at least 128-bit. If the FPU is using $Fxxx codes then there goes the %11 needed for new 64-bit size, for the %1111 case at least.

 Status: Offline
Profile     Report this post  
matthey 
Re: Apple moving to arm, the end of x86
Posted on 1-Aug-2020 22:18:19
#138 ]
Super Member
Joined: 14-Mar-2007
Posts: 1968
From: Kansas

Quote:

Hypex wrote:
I suppose, like those 80's movie cops that did it by the the book, Intel needed to do it by the byte. Plus other things like real mode and virtual. The 68K would need longer encodings, but it needs to be even, doing it by the word.


There was good reason to encode by the byte in ancient computer history. Instructions and data was fetched in small chunks sometimes as small as a byte so small instructions could sometimes start executing sooner (likewise, little endian sometimes allowed narrow ALU calculations to start sooner with the least significant data). There are only 2^8=256 byte encodings but the 8086 didn't have many registers (some of which were used implicitly improving code density), often operated on the stack and didn't have many data sizes to support. The 8086 can have better code density than even the 68k for byte (text) processing but this is with lots of memory traffic and small instructions to execute. Optimizing for size on the x86 and x86_64 often results in poor performance as the shortest instruction encodings were chosen in the 8086. The 68k 16 bit encoding has 2^16=65536 base encodings which allowed for a more general purpose processor supporting more registers and data sizes.

An example mostly text program which has better code density on the 8086 than the 68k has the following metrics when optimizing for size on the 68k, x86 and x86_64.

68k 156 instructions, 394 bytes of code, 48 memory accesses
x86 224 instructions, 495 bytes of code, 106 memory accesses
x86_64 227 instructions, 520 bytes of code, 112 memory accesses

x86 and x86_64 cores require to choose performance optimization (new longer encodings) or size optimization (old specialized byte and stack encodings) with an expected large difference in performance while the 68k can expect good performance when optimizing for size giving the best of both worlds. This is another reason to like the 68k for the embedded market where size optimization is more common and the 68k has a large advantage in energy savings.

The Cast BA2 ISA uses 16, 24, 32 and 48 bit instruction encodings which gives great code density although instructions at odd addresses doesn't appeal to me. I expect the 68020 ISA and Thumb 2 with variable length 16 bit encodings can come close in code density without instructions at odd addresses. The 68k ISA has room for code density improving enhancements too.

Quote:

I would have expected them to use the size bitfield for other things as it's never that easy. I don't know if they planned that far ahead in the late 70's. The 68K did get some 64-bit support later on, with results divided into two registers. Not the most perfect solution, a sign it was busting at the seams. Of course, with a 64-bit mode that reuses some encodings, more transistor space is needed.


At introduction in 1979, the 68000 had a 32 bit ISA at a time when most CPUs were still 8 bit (the 16 bit 8086 was introduced in 1978 but it was still optimized for 8 bit). It would have been difficult to imagine that more than 4GiB of address space would ever be useful, memory would be cheap enough to make it possible or that programs would be wasteful enough to make it desirable, especially with the great code density of the 68k. Ironically, the 68k ISA aged better than the competition yet is used less. Old ARM software needs to be patched to run on new ARM hardware because it only had 26 bit addressing with the upper bits of the PC register used by the Processor Status Register. x86/x86_64 CPUs often support everything but at a cost in complexity. It's easier to use an emulator like DOSBox for old software on x86/x86_64. The 68k can use the whole 32 bit address space and can address it all more efficiently than most modern CPUs. On the Amiga, most software incompatibility problems come from banging the hardware sometimes necessary because the AmigaOS was not optimized or bug free enough.

Quote:

That's a good point. The 68K has been criticised in places for not having GPRs. But moving a word into an address register does have the effect of clearing the upper word and being able to do tricks by moving around data and not have the CC affected. Thinking about it, the usual sizes should work as usual., since the .b, .w and .l instructions wouldn't know of anything larger. Even if the registers on file are 64-bit, they should only use the size in the instruction, to both be correct as well as use less cycles when it isn't needed.


When the 68k was introduced, it was praised for it's abundance of GP registers and when RISC came out it was criticized. RISC needs more than 16 registers for performance and the 68k doesn't. The 68k with 16 registers is within a few percent of the memory traffic (including register spills) of most 32 register RISC CPUs. The 68k often has less memory traffic, fewer instructions to execute and better code density than most compressed RISC encodings like Thumb 2 or microMIPS as they often decrease or restrict the number of available registers. x86_64 cores perform well with 16 registers.

Quote:

Another thing is an increase of both address and data register banks, but this may be more problematic than 64-bit, since five bits are needed over four in encodings.


It would be possible to add another 8 data registers in a separate mode but it would be difficult to get the kind of consistency I'd like to see without compromises. Gunnar wanted to add them without a separate mode which the majority of developers opposed back in the Natami forum days to his angst. Address registers are more difficult to add without major changes to the encodings, although Gunnar added them too. I made an encoding proposal for increasing the FPU registers to 16 with 3 op instruction variations but it was rejected by Gunnar because it didn't have enough registers. It's ironic that the 68k Apollo Core is adding registers everywhere while the Tabor PPC CPU is reducing registers to save energy. Freescale was wanting to reduce power to compete with ARM while Gunnar wants to compete against x86_64 on the desktop, while in an FPGA. I guess core design is all about priorities no matter how warped.

Quote:

FPU is another subject. Haven't looked into x87 lately, but the 68882 and built in FPU was superior to 64-bit with 80-bit width in the 80's. Should it be trimmed down to be in alignment or kept? By itself there is 8 extra registers needed. Add vectors and then you need another 8, lilkely wider than FPU, at least 128-bit. If the FPU is using $Fxxx codes then there goes the %11 needed for new 64-bit size, for the %1111 case at least.


An 80 bit FPU has advantages and disadvantages. It can reduce cumulative errors, can reduce costly overflow and underflow exceptions and simplifies some algorithms reducing the number of instructions and improving code density but the wider ALU can add a little latency and, yes, 64 bits is more efficient to access in memory. We already have the 68k FPU and it is nice so I would prefer to keep it for compatibility. Decreasing the precision can rarely cause problems as I warned Gunnar about and it did (Frank Wille reported FPU errors to me from the 68k FPU libraries I worked on which I discovered was his UAE 64 bit FPU emulation). The 80 bit registers could be kept and shared with a 128 bit or wider SIMD unit registers like IBM did in a smart way with POWER.

 Status: Offline
Profile     Report this post  
Hammer 
Re: Apple moving to arm, the end of x86
Posted on 2-Aug-2020 0:44:31
#139 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5246
From: Australia

@matthey

1. The clone PC business (i.e. the microcomputer's "VHS" business model ) with the unified MS-DOS ecosystem beaten a non-unified 68K microcomputer ecosystem.

IBM PC was a factor in establishing PC standards and gave Microsoft's bought out Q-DOS the inertia. PC clones and PC's business apps hold the line for the PC platform despite the 68000's superiority over 8086 and 80286. 80386 rivals 68000 and 68020.

MS has both Xenix and MS-DOS.

2. November 1, 1995, Intel Pentium Pro shows the future for X86 with X86-RISC hybrid. AMD follows X86-RISC hybrid with K5 in March 27, 1996.

3. MS supported AMD64 (x86-64) which doomed Intel/HP's Itanium (IA-64) and IBM's PowerPC 970 (earlier, IBM's PowerPC 620 was X86-Power64 hybrid, MS says it wouldn't support it, hence killing it).

4. Intel's and AMD's Ghz race road-killed many RISC based desktop/workstation platforms in the late 1990s.
-----

In the early 1990s before Amiga 4000, I was running with the following hardware

Amiga 3000 with 68030/68882 at 25Mhz and Amiga OS 2.0x. 6 MB RAM.

IBM PS/2 Model 55SX with 80386SX at 16Mhz and 387 at 33Mhz (jumpered to 33Mhz from 25 Mhz part). 2nd hand from the government's surplus. 5MB RAM.

386 PC clone with 80386DX33 at 33Mhz, 387 at 33Mhz with L2 cache on the motherboard + Tseng Labs SVGA. My primary machine to run the 1990s Doom type games.

The software that was I running was Imagine 3D 2.0 on both Amiga and PC, my Amiga 3000 has a slight raytracing edge over 55SX.

Around 1996, I bought my classic Pentium 150(OC 166Mhz) with S3 Trio64 SVGA when Amiga's Cybergraphics S3 Trio equivalent was a ripoff!

Amiga 4000 with 68060 costs more than my classic Pentium 150 (OC 166Mhz) with S3 Trio64 SVGA. Amiga 4000's AGA was inferior to S3 Trio64 SVGA. It's Quake at this point.


Around 1998, I have NVIDIA Riva and TNT2 M64 on my Celeron 300A(OC to 450 Mhz with 100Mhz FSB). It's Quake GL at this point.

-----

Why compared a classic Pentium at 75 Mhz against 68060 at 75 Mhz when my classic Pentium PC has 150Mhz which is OC to 166Mhz (simple jumper FSB setting)? Higher clock speed is a feature.

The embedded argument does nothing for my raytracing and competitive gaming workloads. I have 4.5KW solar panels on my roof, hence the embedded argument is moot.


------
68K family wasn't 100% compatible with each other which is a PITA. My recent 68020 vs 68040 libraries encounter remind me of PITA.

Last edited by Hammer on 02-Aug-2020 at 01:51 AM.
Last edited by Hammer on 02-Aug-2020 at 01:43 AM.
Last edited by Hammer on 02-Aug-2020 at 01:39 AM.
Last edited by Hammer on 02-Aug-2020 at 01:30 AM.
Last edited by Hammer on 02-Aug-2020 at 01:22 AM.
Last edited by Hammer on 02-Aug-2020 at 01:20 AM.
Last edited by Hammer on 02-Aug-2020 at 12:47 AM.
Last edited by Hammer on 02-Aug-2020 at 12:45 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: Apple moving to arm, the end of x86
Posted on 2-Aug-2020 2:13:59
#140 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5246
From: Australia

@matthey

Quote:

matthey wrote:
Quote:

kolla wrote:
What RT roots? As far as I know, the OS offer no timing guarantees.


Hammer was talking about "Ray Tracing" not Real Time. The Amiga was one of the earliest home computers to display a good enough picture to see the results of ray tracing so it became popular until the 68k was left behind as it requires considerable processing power. Real time ray tracing uses different and faster algorithms and specialized hardware to replace rasterized 3D rendering. Performance is now good enough to play 3D games at lower frame rates than rasterized hardware with room to improve the technology. The big advantage is more realistic lighting and shadows.

The major component with raytracing is the search problem with intersecting branch math. Bounding volume hierarchy (BVH) tree structure is the search component.

Xbox Series X's Forza Motosport 8 has 4K resolution at 60 fps with raytracing (via RDNA 2 GPU with 52 CUs and DXR Tier 1.1).

Cross-generation games such as BattleField V has additional raytracing overhead when it switches between BVH and legacy non-BVH tree structures. Also, DXR Tier 1.1 has a higher efficiency when compared to DXR Tier 1.0. BattleField V used DXR Tier 1.0.

I use Blender 3D 2.83's hardware-accelerated RT on RTX 2080/2080 Ti cards to speed up raytracing which is faster than 64 cores Zen 2. Blender's 3D engine was reworked for BVH structures.




_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle