Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

13 crawler(s) on-line.

116 guest(s) on-line.

2 member(s) on-line.

Rob,

matthey

You are an anonymous user.
Register Now!

Rob: 5 secs ago

matthey: 3 mins ago

VooDoo: 11 mins ago

Mr_Capehill: 16 mins ago

outlawal2: 42 mins ago

kiFla: 1 hr 29 mins ago

kriz: 1 hr 31 mins ago

pixie: 1 hr 52 mins ago

retrofaza: 1 hr 59 mins ago

vox: 2 hrs 2 mins ago

Forum Index

Amiga General Chat

68k Developement

Poster

Thread

matthey

Re: 68k Developement
Posted on 23-Oct-2018 22:59:24

[ #461 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2009
From: Kansas

Quote:

cdimauro wrote:
It depends on how you organize your 68K_64 ISA.

x64 by default zero-extends all 32-bit operations, and the LEA can be forced to generate 32-bit addresses, so it's easy to have/handle 32-bit pointers.

It may be easy to handle 32 bit pointers but it is also easy for existing 32 bit code to mishandle 32 bit pointers in a 64 bit CPU.

Quote:

As I said some time ago, I don't know how convenient is investing on a traditional FPU nowadays.

SIMD units support scalar operations too (because they are needed even when using packed data), so you basically have scalar floating point instructions "for free". For some embedded market you can just disable/don't implement the packed versions.

The problem with SIMD units is that they take a lot of space for the encoding. If you still want to use the EA a the second source operand (so basically following "the CISC way"), then it's likely that you need to go for 6 bytes opcodes, or drop drastically drop some features.

I'm for completely reuse the line-A for SIMD packed and line-F for SIMD scalar operations on the new ISA. With 16-bit registers, no masks, and the vector length selectable at runtime it might fit on 32-bit opcodes.

I know it is possible to have an SIMD unit or vector FPU which also handles scalars (the existing 68k FPU is not a good candidate to turn into a vector FPU). Many nice FPU features have been dropped to allow fast SIMD operations and then they are usually not as easy to use. An SIMD unit wants to flush to zero where an FPU handles denormals/subnormals for example. It is possible to have many settings and modes to handle such cases but it is not unreasonable to keep the SIMD unit and FPU separate (lean and mean SIMD unit with a high precision and easy to use FPU). The 68k FPU enhancement with 16 FPU registers and 3 OP still used a minimal amount of F-line encoding space, keeps several existing encodings like FBcc, can trap or implement existing 68k FPU instructions and has room to add more fp instructions. This is only a fraction of the encoding space required by an SIMD unit. It would use a fair amount of transistors but power gating works well here as well as on the SIMD unit. The registers could be shared between the SIMD unit and FPU like the POWER ISA.

Status: Online!

matthey

Re: 68k Developement
Posted on 24-Oct-2018 0:17:48

[ #462 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2009
From: Kansas

Quote:

OneTimer1 wrote:
This babbling about non existing possibilities is sick, its like Geeks masturbating about retro technology. The thread started with someone asking: "Why has Motorola abandoned this CPU"
The answers where given in the first few postings, the topic is carried now away to fantasy architectures that no one could use in an existing 68k system.

Tell us the limitations which give "non existing possibilities".

Quote:

If the Apollo team wanted to be taken serious outside the Amiga Retro market, they must make this CPU Linux compatible, they will need a MMU and a reliable flawless FPU a SPI and I2C interface would be needed too.

An MMU and FPU are not necessary for some embedded markets but useful for the most lucrative ones. The Apollo core FPU needs full precision for maximum 68k FPU compatibility. UAE has had similar problems with reducing the precision to double precision. Many new algorithms work fine with double precision but some older algorithms use extended precision to reduce the number of fp instructions and improve precision.

Quote:

But the FPGA is much to expensive to compete with an ARM, but maybe not for customers who need 68k compatibility. And customers needing a 64Bit CPU could easily switch to ARM, they don't need a 68k compatibility.

ARM CPUs are cheap because of economies of scale in embedded CPUs. The 68k could take advantage of this by finding embedded partners and customers too. I was talking to a 68k friendly embedded CEO who could have internet stack capable CPUs made for about $.03 U.S. each in the quantities they needed. I wrote more about the ASIC costs in the following link.

https://amigaworld.net/modules/newbb/viewtopic.php?topic_id=42886&forum=17&3#817558

Gunnar saw only limitations where I was finding possibilities. It is unfortunate he created his designs with these limitations and was oblivious to what I was trying to do.

Status: Online!

cdimauro

Re: 68k Developement
Posted on 24-Oct-2018 6:00:18

[ #463 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey Quote:
matthey wrote:
Quote:
cdimauro wrote:
It depends on how you organize your 68K_64 ISA.

x64 by default zero-extends all 32-bit operations, and the LEA can be forced to generate 32-bit addresses, so it's easy to have/handle 32-bit pointers.

It may be easy to handle 32 bit pointers but it is also easy for existing 32 bit code to mishandle 32 bit pointers in a 64 bit CPU.

No, existing 32 bit code cannot create any problem, because working with 32-bit data always clears the upper 32-bit in a 64-bit CPU.
Quote:
Quote:
As I said some time ago, I don't know how convenient is investing on a traditional FPU nowadays.

SIMD units support scalar operations too (because they are needed even when using packed data), so you basically have scalar floating point instructions "for free". For some embedded market you can just disable/don't implement the packed versions.

The problem with SIMD units is that they take a lot of space for the encoding. If you still want to use the EA a the second source operand (so basically following "the CISC way"), then it's likely that you need to go for 6 bytes opcodes, or drop drastically drop some features.

I'm for completely reuse the line-A for SIMD packed and line-F for SIMD scalar operations on the new ISA. With 16-bit registers, no masks, and the vector length selectable at runtime it might fit on 32-bit opcodes.

I know it is possible to have an SIMD unit or vector FPU which also handles scalars (the existing 68k FPU is not a good candidate to turn into a vector FPU). Many nice FPU features have been dropped to allow fast SIMD operations and then they are usually not as easy to use. An SIMD unit wants to flush to zero where an FPU handles denormals/subnormals for example. It is possible to have many settings and modes to handle such cases

I don't remember now, but AFAIR at least AVX-512 has specific settings and/or instructions to handle it.

Consider that the old x87 FPU is deprecated on x64 in 64-bit mode: SSE should be used instead. So I assume that SSE is also able to handle de/subnormals.
Quote:
but it is not unreasonable to keep the SIMD unit and FPU separate (lean and mean SIMD unit with a high precision and easy to use FPU). The 68k FPU enhancement with 16 FPU registers and 3 OP still used a minimal amount of F-line encoding space, keeps several existing encodings like FBcc, can trap or implement existing 68k FPU instructions and has room to add more fp instructions. This is only a fraction of the encoding space required by an SIMD unit. It would use a fair amount of transistors but power gating works well here as well as on the SIMD unit. The registers could be shared between the SIMD unit and FPU like the POWER ISA.

It can be a reasonable solution. The important thing is that SIMD and FPU instructions can freely access the same (size-extended, for SIMD) registers set without any issue (on x86 you had to use specific, special instructions to switch between the FPU and MMX instructions. So, a context switch is required, which slows down execution).

Status: Offline

Hypex

Re: 68k Developement
Posted on 24-Oct-2018 14:01:13

[ #464 ]

Elite Member

Joined: 6-May-2007
Posts: 11215
From: Greensborough, Australia

@cdimauro

Quote:
I've already provided plenty of details with my previous post, so I hope that it's not necessary anymore.

Oh you have and I'm fine with the details you have provided. I was only pointing out this statement of yours. Just saying.

Quote:
It's a general question, which applies to pixel alignment as well. I'll talk specifically and more clearly after.

Quote:
Same for me: I prefer 8, 16 and 32 bits packed pixels. But if we need to optimize space and/or bandwidth usage (like we were discussing), then they should be considered.

Fair enough. Also would be good if compositing was used. Though that kind of hardware blitting doesn't usually place those kind of limits on asset data.

Quote:
More or less. But the problem is mostly related to the 68000, which cannot access unaligned data (hence the read 2 bytes + pack operation, and unpack + write 2 bytes).

I see that yes. This reminds of the coding of compression algorithms on the Amiga. It makes sense to shift a whole binary value in place where possible. But on the Amiga they used to do it by the bit. Shift it into the C and rotate it in. LSR then ROXL usually. A 1 was preloaded so when it made the last shift or rotate it would signal a 32-bit word. I thought it was neat. But I recall wanting to do something similar that might have been on PowerPC. And it was suggested to simply shift it and put the data in by whole. Rather than slowly bit by bit. Makes sense. Of course you need another register for bit count then.

Quote:
68020 and x86/x64 has not that constraint, so the operation is MUCH easier and faster. For the 68020 it's also a piece of cake, because you can use just ONE bitfield instruction to extract the value or insert the new one. More modern x86/x64 CPUs have similar instructions.

Sounds good to me. Neat. A free lunch. Okay not exactly free but still good.

Quote:
As I said before, the ONLY advantage of bitplanes comes when you have to access a single plane, or a few planes (less than the depth), which is quite rare.

I had to think of something. I always wondered if anyone thought to create a paint package like that. PlanarPaint. Ha. It would be almost like having 8 single colour giant screen sized sprites superimposed on each other. At least for AGA.

Quote:
Even more than 64. The problem is only for bitplanes: the wider the data bus, the worse is the alignment restriction -> wasted space & memory bandwidth.

And it was intended to optimise the memory bandwidth.

Quote:
You're asking a completely different thing here. The Blitter was good for its specific purpose, and it cannot be changed like you stated. A new, different 3D unit is better.

It may look completely different but I'm not thinking of 3d specifically but building on concepts already there. For example the line mode could have been extended to read the texture from a bitplane. So a bitplane line could have been used as a line texture.

Given the blitter was made to take in rectangular bit patterns from one position and blit them to another position, using the barrel shifter, it would have been useful if it was extended so it would shrnk or expand a bit pattern for scaling. I realise it would have had to group bits in blocks and skip bits as well as deal with double scaling or halving where every second bit would be discarded. Easy on the vertical but harder on the horizontal.

Warping would be harder I know. Since it would need to scale on the width, height, or in some cases both. Just would have been nice to see.

Quote:
No, I mean: if you want to achieve 1/2 or 1/4 pixel scrolling, then you can use the same BPLCON1 register for applying it. BPLCON1 is also needed for packed graphic which has a depth < 8.

Oh I was thinking on another system besides Amiga. I thought we were talking about scrolling with packed pixels. Doing quarter pixel scrolling on some chunky hardware.

Quote:
Believe me: there's absolutely no difference with the Amiga scrolling. Only an advantage if you have packed graphic with depth 8, 16, 24 and 32, and IF you don't need 1/2 and 1/4 pixel scrolling, because in this case you can implement the hardware scrolling just using the packed "plane" pointer (whereas on Amiga you need to update all bitplanes pointer AND set BPLCON1).

In that copper list yes which is easy to forget about. I can think of a quarter scroll being done with say a 1280x1024 screen that has scaled up 320x256 graphics. But it would be redundant just about and extra work to simulate it.

Quote:
AFAIR Atari ST line had bitplanes. And NO Blitter

What were they thinking?

But I was thinking of the Atari Falcon Amiga killer.

Quote:
You don't need it: better to avoid it on AGA re-implementations.

UHRes implies some kind of ultra high res mode. Like 2560x1024. Looks cool.

Last edited by Hypex on 24-Oct-2018 at 02:06 PM.

Status: Offline

OneTimer1

Re: 68k Developement
Posted on 24-Oct-2018 19:25:49

[ #465 ]

Cult Member

Joined: 3-Aug-2015
Posts: 980
From: Unknown

@matthey

Quote:

matthey wrote:
Quote:

OneTimer1 wrote:
... the topic is carried now away to fantasy architectures that no one could use in an existing 68k system.

Tell us the limitations which give "non existing possibilities".

There are only 3 reasons:
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU

They could not use 64bit.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 25-Oct-2018 6:26:59

[ #466 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hypex Quote:
Hypex wrote:
@cdimauro Quote:
Same for me: I prefer 8, 16 and 32 bits packed pixels. But if we need to optimize space and/or bandwidth usage (like we were discussing), then they should be considered.

Fair enough. Also would be good if compositing was used. Though that kind of hardware blitting doesn't usually place those kind of limits on asset data.

I don't know if compositing supports CLUT modes, but anyway there's absolutely no difference from this PoV.
Quote:
Quote:
68020 and x86/x64 has not that constraint, so the operation is MUCH easier and faster. For the 68020 it's also a piece of cake, because you can use just ONE bitfield instruction to extract the value or insert the new one. More modern x86/x64 CPUs have similar instructions.

Sounds good to me. Neat. A free lunch. Okay not exactly free but still good.

It's heaven compared to bitplanes.
Quote:
Quote:
Even more than 64. The problem is only for bitplanes: the wider the data bus, the worse is the alignment restriction -> wasted space & memory bandwidth.

And it was intended to optimise the memory bandwidth.

It was sold this way, but I don't know if it was due to ignorance or pure marketing propaganda.
Quote:
Quote:
You're asking a completely different thing here. The Blitter was good for its specific purpose, and it cannot be changed like you stated. A new, different 3D unit is better.

It may look completely different but I'm not thinking of 3d specifically but building on concepts already there. For example the line mode could have been extended to read the texture from a bitplane. So a bitplane line could have been used as a line texture.

Given the blitter was made to take in rectangular bit patterns from one position and blit them to another position, using the barrel shifter, it would have been useful if it was extended so it would shrnk or expand a bit pattern for scaling. I realise it would have had to group bits in blocks and skip bits as well as deal with double scaling or halving where every second bit would be discarded. Easy on the vertical but harder on the horizontal.

Warping would be harder I know. Since it would need to scale on the width, height, or in some cases both. Just would have been nice to see.

Unfortunately it's completely different from the simple Blitter design, even for applying a general pattern for line drawing.
Quote:
Quote:
No, I mean: if you want to achieve 1/2 or 1/4 pixel scrolling, then you can use the same BPLCON1 register for applying it. BPLCON1 is also needed for packed graphic which has a depth < 8.

Oh I was thinking on another system besides Amiga. I thought we were talking about scrolling with packed pixels. Doing quarter pixel scrolling on some chunky hardware.

Yes, we were talking in general comparing bitplanes and packed graphic, so quarter pixel scrolling is part of the discussion.

To synthesize, for a packed graphic "Amiga version" (so, same chipset, but with packed graphic, and still limited to a max depth = 8) you should imagine the current one that you know, but with a few changes:
- only 2 pointers and 2 DAT registers for the two playfields, instead of 8 for both data types;
- an optional additional 3 bits field for specifying a separate depth for the second playfield, if you want to better define and use the DP mode;
- two additional 3 bits fields for B and C+D channels, to allow the Blitter to know the depth of the source and the target channels;
- an additional bit to allow the Blitter to automatically generate the mask coming from channel B (instead of using channel A for it; B is the source -> BOB data); the generated mask is treated as channel A data;
- an optional B "mask" register which is immediately applied (ORed) to the B channel data, which allows to make some special effects (like palette/color selection). This register is rotated (instead of shifted) like the B channel data;
- the Blitter logic is extended as 0 -> no packed data, 1 -> take the packed data.
Quote:
Quote:
Believe me: there's absolutely no difference with the Amiga scrolling. Only an advantage if you have packed graphic with depth 8, 16, 24 and 32, and IF you don't need 1/2 and 1/4 pixel scrolling, because in this case you can implement the hardware scrolling just using the packed "plane" pointer (whereas on Amiga you need to update all bitplanes pointer AND set BPLCON1).

In that copper list yes which is easy to forget about. I can think of a quarter scroll being done with say a 1280x1024 screen that has scaled up 320x256 graphics. But it would be redundant just about and extra work to simulate it.

See above. And yes: basically quarter pixels is scaled graphic making use of the TV signal properties. On everything else (and modern) you have no such half or quarter pixels.
Quote:
Quote:
AFAIR Atari ST line had bitplanes. And NO Blitter

What were they thinking?

But I was thinking of the Atari Falcon Amiga killer.

Ah, ok, there was the Falcon. I was thinking about ST and STE.
Quote:
Quote:
You don't need it: better to avoid it on AGA re-implementations.

UHRes implies some kind of ultra high res mode. Like 2560x1024. Looks cool.

But it was only monochrome (1 bitplane).

Status: Offline

cdimauro

Re: 68k Developement
Posted on 25-Oct-2018 6:34:35

[ #467 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@OneTimer1 Quote:
OneTimer1 wrote:
@matthey Quote:

matthey wrote:

Tell us the limitations which give "non existing possibilities".

There are only 3 reasons:
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU

So, only one reason, which is already addressed by keeping the current 68K execution mode, as I and matt discussed.
Quote:
They could not use 64bit.

Wrong again: you can use 64-bit, with the new 64-bit execution mode.

To give another example, the Vampire has already extended the CPU to use both 64-bit registers and new registers, and Exec is patched accordingly in order to correctly support those new features, in a transparent way (you can run both Amiga hardware-hitting games and Amiga o.s. applications).

Status: Offline

megol

Re: 68k Developement
Posted on 25-Oct-2018 16:01:34

[ #468 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@OneTimer1
Quote:

There are only 3 reasons:
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU

They could not use 64bit.

Not directly without porting/rewriting the OS but it can still be useful. For instance running multiple virtual machines with up to 4GiB available for each one, running some components in 64 bit mode while keeping all other software in 32 bit mode.

I think we try to imagine some way the 68k could become relevant again, at least for a small niche. Then one have to look at what is expected in 2018 and beyond. There's already Coldfire for embedded products, 68k processors for legacy operations but nothing supporting larger amounts of memory for higher end embedded for instance.
And even something like the Raspberry Pi have 64 bit capable processors now.

However I agree this is all fantasy at least until these ideas become real (which I don't expect to happen). :)

Status: Offline

matthey

Re: 68k Developement
Posted on 25-Oct-2018 20:28:01

[ #469 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2009
From: Kansas

Quote:

OneTimer1 wrote:
There are only 3 reasons:
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU
- existing 68k systems needs a 32bit CPU

They could not use 64bit.

Is this a 68k problem or an Amiga problem?

- existing Amiga PPC systems need a 32bit CPU
- existing Amiga PPC systems need a 32bit CPU
- existing Amiga PPC systems need a 32bit CPU

With 64 bit 68k support, AROS 64 bit would compile for the 68k as well as a PPC 64 bit target. Why hasn't this been done for 64 bit PPC CPUs? It is because it is an Amiga compatibility problem. Is AArch64, PPC 64 bit, x86_64 or 68k 64 bit more likely to maintain compatibility? While a new and proper 68k design is a long shot, it is still the best chance to maintain compatibility while adding 64 bit support. Also, a custom 68k CPU is probably the best chance to add security while retaining compatibility. This would require investment and a change of marketing strategies from niche market to mass market (for an ASIC) but the current Amiga strategy is broken and likely not profitable.

Status: Online!

bison

Re: 68k Developement
Posted on 3-Nov-2018 15:20:38

[ #470 ]

Elite Member

Joined: 18-Dec-2007
Posts: 2112
From: N-Space

@matthey

Quote:
There are plenty of embedded CPUs like the Cortex-M and Cortex-R series.

https://en.wikipedia.org/wiki/ARM_Cortex-M

There are older ARM CPUs like the Cortex-A8 which are "superseded" but still available and popular. Fido is in-order. Most of the Cast CPUs are in-order.

http://www.cast-inc.com/ip-cores/processors32bit/index.html

Some designs are partial/limited OoO like in-order issue OoO completion. I expect this is what the BA25, most PowerPC, many ARM CPUs and probably Apollo core are using. It is OoO issue OoO completion (but still in-order graduation) which can be very complex and use insane amounts of energy.

A new in-order CPU:

https://www.sifive.com/press/sifive-core-ip-7-series-creates-new-class-of-embedded

Quote:
The 7 series is designed on a highly optimized 8-stage in-order pipeline, which introduces microarchitectural features to prevent side channel attacks thereby enabling a robust and secure processor implementation.

Last edited by bison on 03-Nov-2018 at 03:22 PM.

_________________
"Unix is supposed to fix that." -- Jay Miner

Status: Offline

matthey

Re: 68k Developement
Posted on 4-Nov-2018 19:23:14

[ #471 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2009
From: Kansas

Quote:

bison wrote:
A new in-order CPU:

https://www.sifive.com/press/sifive-core-ip-7-series-creates-new-class-of-embedded

Quote:
The 7 series is designed on a highly optimized 8-stage in-order pipeline, which introduces microarchitectural features to prevent side channel attacks thereby enabling a robust and secure processor implementation.

Yes. They are aiming at the huge mid performance embedded CPU market with a scalable 64 bit processor. The security problems of more speculative CPU designs allows simpler processors to get a foot in the door against ARM. Good marketing literature. The big challenge is the lack of software and tools for RISC-V. This is the market the 68k should have gone after as it has more software and 20%-25% better code density than the most compact RISC-V code variants.

Status: Online!

cdimauro

Re: 68k Developement
Posted on 5-Nov-2018 6:11:28

[ #472 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey Quote:

matthey wrote:
Quote:
bison wrote:
A new in-order CPU:

https://www.sifive.com/press/sifive-core-ip-7-series-creates-new-class-of-embedded

[quote]The 7 series is designed on a highly optimized 8-stage in-order pipeline, which introduces microarchitectural features to prevent side channel attacks thereby enabling a robust and secure processor implementation.

Yes. They are aiming at the huge mid performance embedded CPU market with a scalable 64 bit processor. The security problems of more speculative CPU designs allows simpler processors to get a foot in the door against ARM. Good marketing literature.[/quote]
We already discussed about it, especially with megol: even in-order processors are subject to some side-channel attacks.
OoO will not be dropped because of this. In fact, even Si-Five continue to sell OoO designs. No processors vendor is dropping its OoO products.
Quote:
The big challenge is the lack of software and tools for RISC-V. This is the market the 68k should have gone after as it has more software

I don't know how much it's true. RISC-V is already ported to some Linux distro, and AFAIR has a LLVM toolchain as well, so it enables the usage of tons of open source software for this platform.

How is Linux support for 68K?

Finally, we know that only an old GCC version is officially supported, which doesn't generate good code. Bebbo work is great, but it should be integrated in GCC's master branch, in order to have better support and, especially, be already usable by end users without messing with patches. This is something which should be seriously pushed to give a better chance to 68K, so I hope that some can convince Bebbo to do it, or give an help here.
Quote:
and 20%-25% better code density than the most compact RISC-V code variants.

I don't remember a direct analysis of 68K and RISC-V: do you have some nice paper to read or is your personal feeling?

BTW, I've added the prologue/epilogue optimizations to my ISA, following the idea which RISC-V designers had, and it shows a good improvement in code density (but 2 more jumps need to be executed): up to 5% for 32-bit code and up to 2% to 64-bit one (because I already have a good support for accessing stack variables).
So, the idea seems good enough to be implemented for ISAs which lack multiple registers load/stores. Of course it doesn't make sense for 68K, which already has the nice MOVEM instruction.

Status: Offline

hth313

Re: 68k Developement
Posted on 5-Nov-2018 15:40:44

[ #473 ]

Regular Member

Joined: 29-May-2018
Posts: 159
From: Delta, Canada

@cdimauro

Quote:

cdimauro wrote:

BTW, I've added the prologue/epilogue optimizations to my ISA, following the idea which RISC-V designers had, and it shows a good improvement in code density (but 2 more jumps need to be executed): up to 5% for 32-bit code and up to 2% to 64-bit one (because I already have a good support for accessing stack variables).
So, the idea seems good enough to be implemented for ISAs which lack multiple registers load/stores. Of course it doesn't make sense for 68K, which already has the nice MOVEM instruction.

Thank you for the link to this RISC-V paper you gave me earlier.

I looked at this prologue trick of RISC-V, and essentially the trick is that they can avoid complicated instructions that violate their design principles, thanks to that they can choose the link register used in the call to a tailored routine (they often want to save the real link register). This allows for tailored calls to support routines that can do the job of preserving callee-save registers (including the RA link register).

Furthermore, I think their 16-byte stack alignment kind of works for them here, though they do not talk about it. This could allow for saving 4 register at a time (RV32), reducing the number of such call routines. On the other hand, the stack alignment can also be used to provide some stack slots for free when saving fewer registers. I wonder if they thought about this when they defined up to 13 registers for the caller to save in their ABI (S0-S11 and RA).

I do not understand why they talk about RVC (compressed instructions) in this context and that compilers need to be RVC-aware. In case the compiler is not RVC-aware (do not try to favour code selection that benefits the compressed instruction set), it will just think ordinary wide instructions and there you would have the multi-save instructions. A compiler may very well be RVC-aware for code selection purposes, but this idea has nothing to do with it as far as I can see.

Reference:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#register-convention

Status: Offline

cdimauro

Re: 68k Developement
Posted on 5-Nov-2018 20:04:23

[ #474 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@hth313 Quote:
hth313 wrote:
@cdimauro
Quote:
cdimauro wrote:

BTW, I've added the prologue/epilogue optimizations to my ISA, following the idea which RISC-V designers had, and it shows a good improvement in code density (but 2 more jumps need to be executed): up to 5% for 32-bit code and up to 2% to 64-bit one (because I already have a good support for accessing stack variables).
So, the idea seems good enough to be implemented for ISAs which lack multiple registers load/stores. Of course it doesn't make sense for 68K, which already has the nice MOVEM instruction.

Thank you for the link to this RISC-V paper you gave me earlier.

I looked at this prologue trick of RISC-V, and essentially the trick is that they can avoid complicated instructions that violate their design principles, thanks to that they can choose the link register used in the call to a tailored routine (they often want to save the real link register). This allows for tailored calls to support routines that can do the job of preserving callee-save registers (including the RA link register).

Exactly, and that's what I've implemented as well, but with a difference: I just use my peephole optimizer to catch specific patterns of instructions and replacing them with a (quick) function call.

This approach was easy to implement, but doesn't provide the best results, because I had to define a specific set of patterns to be replaced, and use/apply them to all binaries which I disassemble, which sometimes use quite different patterns in their prologues/epilogues.

Implementing this on a compiler will surely give a much better code density, because you standardize the prologues/epilogues; it allows also to properly align and restore the stack pointer, which is something not possible currently with my patterns (I can only save/restore registers on the stack, or push/pop them. No stack pointer manipulation is possible, because such instructions are located in different parts of the prologues/epilogue).

Looking at the example that they provide on the paper (pag.67 of https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.pdf ), it's quite evident for me that I can gain much more compared to their implementation, not using the peephole optimizer and moving such part to a compiler, like they did.
Quote:
Furthermore, I think their 16-byte stack alignment kind of works for them here, though they do not talk about it.

Indeed. I haven't seen parts mentioning this 16-byte alignment. However it should be like x64, if I remember correctly.
Quote:
This could allow for saving 4 register at a time (RV32), reducing the number of such call routines. On the other hand, the stack alignment can also be used to provide some stack slots for free when saving fewer registers. I wonder if they thought about this when they defined up to 13 registers for the caller to save in their ABI (S0-S11 and RA).

That's what happens on x64 as well. I saw many times, in the prologues and epilogues, that some registers are directly saved on the stack at specific locations, and restored back in the opposite way when exiting.

Which is strange, since we know that PUSHs and POPs can be used, which are way more compact (in fact code density on x64 is worse also for this reason: such MOVs in the prologues and epilogues are MUCH longer than PUSHs and POPs!).

It's even more strange since I've observed that frequently prologues and epilogues have mixed type of instructions: some registers are moved to/from the stack, and some are pushed/poped. Weird...
Quote:
I do not understand why they talk about RVC (compressed instructions) in this context and that compilers need to be RVC-aware. In case the compiler is not RVC-aware (do not try to favour code selection that benefits the compressed instruction set), it will just think ordinary wide instructions and there you would have the multi-save instructions. A compiler may very well be RVC-aware for code selection purposes, but this idea has nothing to do with it as far as I can see.

Reference:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#register-convention

Yeah, same for me: it's something that doesn't make sense. I can only understand it if they talk about linking binaries that can be compiled for the regular RISC-V ISA, and the compressed one. Here there can be issues in case that you are allowed to link those binaries to generate a final binary which should only run on the RISC-V ISA. Otherwise, again, I don't see any need for "awareness".

Status: Offline

matthey

Re: 68k Developement
Posted on 6-Nov-2018 1:35:42

[ #475 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2009
From: Kansas

Quote:

cdimauro wrote:
We already discussed about it, especially with megol: even in-order processors are subject to some side-channel attacks.
OoO will not be dropped because of this. In fact, even Si-Five continue to sell OoO designs. No processors vendor is dropping its OoO products.

I agree but simpler CPUs are more appealing and marketable now as the linked literature shows. KISS should now stand for Keep It Simple and Secure.

Quote:

I don't know how much it's true. RISC-V is already ported to some Linux distro, and AFAIR has a LLVM toolchain as well, so it enables the usage of tons of open source software for this platform.

How is Linux support for 68K?

There are too many RISC-V variations and the tools are less mature than most other architectures.

Most 68k Linux/BSD support is dated but so is most of the hardware it runs on.

Quote:

Finally, we know that only an old GCC version is officially supported, which doesn't generate good code. Bebbo work is great, but it should be integrated in GCC's master branch, in order to have better support and, especially, be already usable by end users without messing with patches. This is something which should be seriously pushed to give a better chance to 68K, so I hope that some can convince Bebbo to do it, or give an help here.

Bebbo is making a valiant effort but I don't know if the changes are ready or how easy they would be to incorporate in the GCC master branch. Chris Young and I tried to get NetSurf running with Bebbo's changes a while ago and ran into bugs and problems (we reported what we could find). That was awhile ago so maybe it is working better now.

Quote:

I don't remember a direct analysis of 68K and RISC-V: do you have some nice paper to read or is your personal feeling?

Sorry, I have no papers with a comprehensive comparison (pro RISC-V papers conveniently leave out Thumb2 and 68k code density comparisons as we've seen). It was just an estimate based on multiple sources. Vince Weaver's results have the 68k at 22% better code density than RISCV32IMC which has 29% better code density than the PPC (68k is 45% better code density than PPC). Not the best source but it has many architectures including unusual and old ones.

Status: Online!

cdimauro

Re: 68k Developement
Posted on 6-Nov-2018 5:46:39

[ #476 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey Quote:
matthey wrote:
Quote:
cdimauro wrote:
We already discussed about it, especially with megol: even in-order processors are subject to some side-channel attacks.
OoO will not be dropped because of this. In fact, even Si-Five continue to sell OoO designs. No processors vendor is dropping its OoO products.

I agree but simpler CPUs are more appealing and marketable now as the linked literature shows. KISS should now stand for Keep It Simple and Secure.

Which is not possible with OoO designs.

Anyway, I want to clarify that I'm NOT against in-order designs. I personally believe that a CISC CPU can be a better performer compared to RISCs, due to the intrinsic ability to do "more useful work" per single instruction. I think that here a CISC design can shine.

But I also think that OoO designs are here to stay, because they provide the best performances.
Quote:
Quote:
I don't know how much it's true. RISC-V is already ported to some Linux distro, and AFAIR has a LLVM toolchain as well, so it enables the usage of tons of open source software for this platform.

How is Linux support for 68K?

There are too many RISC-V variations and the tools are less mature than most other architectures.

The situation is improving constantly. Too many big companies are involved here, and they have money (developers) to spend.
Quote:
Most 68k Linux/BSD support is dated but so is most of the hardware it runs on.

So, I assume that they are stuck to old kernel/tools/compilers versions.
Quote:
Quote:

Finally, we know that only an old GCC version is officially supported, which doesn't generate good code. Bebbo work is great, but it should be integrated in GCC's master branch, in order to have better support and, especially, be already usable by end users without messing with patches. This is something which should be seriously pushed to give a better chance to 68K, so I hope that some can convince Bebbo to do it, or give an help here.

Bebbo is making a valiant effort but I don't know if the changes are ready or how easy they would be to incorporate in the GCC master branch. Chris Young and I tried to get NetSurf running with Bebbo's changes a while ago and ran into bugs and problems (we reported what we could find). That was awhile ago so maybe it is working better now.

Hum. From your writing it seems that it's still not mature enough.

GCC should have a test suite: it'll be good to give a run and see how many failures are there, and see if/how many serious ones are found. Of course, some new tests might be needed to exercise new scenarios (e.g.: new optimizations).
Quote:
Quote:

I don't remember a direct analysis of 68K and RISC-V: do you have some nice paper to read or is your personal feeling?

Sorry, I have no papers with a comprehensive comparison (pro RISC-V papers conveniently leave out Thumb2 and 68k code density comparisons as we've seen). It was just an estimate based on multiple sources. Vince Weaver's results have the 68k at 22% better code density than RISCV32IMC which has 29% better code density than the PPC (68k is 45% better code density than PPC). Not the best source but it has many architectures including unusual and old ones.

OK, it's fair enough. However will not rely too much to Vince's contest.

IMO it'll be better to make some "triangulation" with some bigger/common software for which there are code density metrics for both 68K and RISC-V (and ARM, x86, x64 will be good to have). AFAIR you provided a link using GCC and other known software, but it'll be difficult to find an equivalent for RISC-V.

Status: Offline

Hypex

Re: 68k Developement
Posted on 1-Dec-2018 0:29:55

[ #477 ]

Elite Member

Joined: 6-May-2007
Posts: 11215
From: Greensborough, Australia

@cdimauro

Sorry for my belated reply. Got kinda busy.

Quote:
It was sold this way, but I don't know if it was due to ignorance or pure marketing propaganda.

I suppose it could have been. It can be common in some systems. There's likely some other graphic chipsets that like the framebuffer or texture to be in some kind of alignment. Of course by that stage 32-bpp already has an alignment.

So within 8-bit graphic modes could all common chunky chipsets place the chunky map at arbitrary locations?

Quote:
Unfortunately it's completely different from the simple Blitter design, even for applying a general pattern for line drawing.

Perhaps. But they would have faced this need eventually. Even at the design stage. All built on single bitmap operations. Without such features the bitmap needs to be split into even smaller blocks to be blitted then repeated per bitplane.

Quote:
Yes, we were talking in general comparing bitplanes and packed graphic, so quarter pixel scrolling is part of the discussion.

I'm wondering if S/VGA could do quarter scrolling like the Amiga. Such as scrolling a low res mode at a high res granularity.

Quote:
To synthesize, for a packed graphic "Amiga version" (so, same chipset, but with packed graphic, and still limited to a max depth = 8) you should imagine the current one that you know, but with a few changes:

Quote:
See above. And yes: basically quarter pixels is scaled graphic making use of the TV signal properties. On everything else (and modern) you have no such half or quarter pixels.

No since it was just a trick really and mode things were low res back then.

Quote:
Ah, ok, there was the Falcon. I was thinking about ST and STE.

Yes the ST/E didn't exactly kill the A1200 in all areas. Well in graphics and audio. Where the Falcon whipped the Amiga. Even without a copper if it lacked it.

Quote:
But it was only monochrome (1 bitplane).

Another Mac mode. Ah forget it. Shapeshifter novelty screenmode.

Status: Offline

megol

Re: 68k Developement
Posted on 1-Dec-2018 21:50:15

[ #478 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@Hypex

Quote:

Hypex wrote:
Quote:
Yes, we were talking in general comparing bitplanes and packed graphic, so quarter pixel scrolling is part of the discussion.

I'm wondering if S/VGA could do quarter scrolling like the Amiga. Such as scrolling a low res mode at a high res granularity.

If I remember correctly that isn't possible. It was ages ago since I touched VGA programming (over 20 years!) so take it with a pinch of salt.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 2-Dec-2018 11:48:57

[ #479 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hypex Quote:
Hypex wrote:
@cdimauro

Sorry for my belated reply. Got kinda busy.

Np. Common problem, unfortunately... :-/
Quote:
Quote:
It was sold this way, but I don't know if it was due to ignorance or pure marketing propaganda.

I suppose it could have been. It can be common in some systems. There's likely some other graphic chipsets that like the framebuffer or texture to be in some kind of alignment. Of course by that stage 32-bpp already has an alignment.

An alignment is natural / needed by the display logic, but it can be easily masked pre-fetching some data in case of hardware scrolling.
Quote:
So within 8-bit graphic modes could all common chunky chipsets place the chunky map at arbitrary locations?

I don't know all packed/chunky chipsets, but I can say that VGA allowed/s to scroll the displayed area inside the video memory, without any restriction.

With its 256KB of video memory you can, for example, define a 640x400@8bit "virtual screen" with the VGA, and show any 320x200 portion of it.
Quote:
Quote:
Unfortunately it's completely different from the simple Blitter design, even for applying a general pattern for line drawing.

Perhaps. But they would have faced this need eventually. Even at the design stage. All built on single bitmap operations. Without such features the bitmap needs to be split into even smaller blocks to be blitted then repeated per bitplane.

Too much work: not feasible. Here were are moving to a texture-map unit, which was completely out of scope at the time.

The most that you can think about is still implementing a line mode, but using an extra register where to put the color to be used for pixel to be drawn.
Quote:
Quote:
Yes, we were talking in general comparing bitplanes and packed graphic, so quarter pixel scrolling is part of the discussion.

I'm wondering if S/VGA could do quarter scrolling like the Amiga. Such as scrolling a low res mode at a high res granularity.

No, it wasn't possible. As I said before, that was a trick used by the Amiga chipset, making use of the TV signal.

You can emulate it by defining an higher-resolution screen (e.g.: 1280x200, for example), and copying the 320x200 (for quarter-pixel scrolling; 640x200 graphic for half-pixel scrolling) graphic 4 times (2 times for half-pixel), but it's not as easy as Amiga allowed it.
Quote:
Quote:
Ah, ok, there was the Falcon. I was thinking about ST and STE.

Yes the ST/E didn't exactly kill the A1200 in all areas. Well in graphics and audio. Where the Falcon whipped the Amiga. Even without a copper if it lacked it.

However the Falcon had several limits.

Status: Offline

pgf_666

Re: 68k Developement
Posted on 11-Dec-2018 3:51:55

[ #480 ]

Member

Joined: 29-Dec-2007
Posts: 45
From: Unknown

@everyone

Can I point out a few things, from a user's POV?

High code density is nice, but it isn't the holy grail. For me. that grail is performance, and ease of use. I want something that gives me the equivalent of a 68060 running at 32 GHz (8 cores at 4 GHz should be close enough), great graphics, of course (otherwise I'd be on a different board), and the ability to run my 32-bit apps at native speed, with full integration; that's why I haven't switched to the 64 bit version of ARos!

The O/S is important , of course, but 3.9+ with a couple of tweaks should do it--expand the memory map to 64 bit addressing, etc.

I do virtually ALL my personal programming--including some stuff for the Win-Don't environment!--in UAE, because it's easier, and sometimes even faster...._Yeah, I know, sick,isn't it?)

And if the registry files gets a bit big--have you SEEN the size of modern chips, and figured out how many 68060s you could put on them at current best fir sizes?

pgf

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle