Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

11 crawler(s) on-line.

86 guest(s) on-line.

1 member(s) on-line.

OlafS25

You are an anonymous user.
Register Now!

OlafS25: 2 mins ago

Karlos: 27 mins ago

CosmosUnivers: 38 mins ago

michalsc: 50 mins ago

ppcamiga1: 52 mins ago

matthey: 1 hr 14 mins ago

bhabbott: 1 hr 25 mins ago

ncafferkey: 2 hrs 31 mins ago

pixie: 2 hrs 35 mins ago

Hypex: 3 hrs 22 mins ago

Forum Index

Amiga General Chat

68k Developement

Poster

Thread

matthey

Re: 68k Developement
Posted on 10-Oct-2018 0:32:07

[ #421 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2015
From: Kansas

Quote:

cdimauro wrote:
That's impressive. I wouldn't have expected a so high frequency for "address registers" instructions.

Take a look at the "Instruction Mix" for SPEC INT92 on page 55 of the following pdf.

https://www.cl.cam.ac.uk/teaching/0405/CompArch/mynotes.pdf

Although the chart is for RISC, loads and stores are either a MOVE or part of a reg-mem instruction. Of the top 12 most frequent instructions, only OR, some shifts, AND and MUL/DIV require a data register.

These notes have some excellent information. There is a small code density comparison chart on page 52 which includes the 68k. Despite the handicap of a statically linked libc, the 68k had the smallest programs in 3 out of 3 benchmarks despite the x86 having a better code density in one.

Quote:

I don't see how it can be based on 68K: the ISA is different, and the opcodes structure too. They implemented more or less the same address modes for load/store instructions, but I don't see other similarities. And, of course, they used 16-bit opcodes, but the structure is absolutely different.

The SuperH ISA looks like 68k instructions and addressing modes (MOV instead of load/store instructions for example) but is encoded very different. There were enough similarities that there were lawsuits between Motorola and Hitachi where both were found to have violated their agreement from 1986. While the court case required Hitachi to pay more for violations, Motorola was in danger of being forced to halt shipments of 68030 CPUs. This resulted in a quick settlement.

Quote:

I breafly took a look at the BJX1 page, but I find it a quite complex ISA. I wonder how easy it can be to implement it.

It is not so simple anymore while some flaws of SuperH are difficult to overcome. The 68k would be minimally more complex to implement and easier to enhance, IMO.

Quote:

Nice link, and I wonder here too, seeing that SuperH2 had a better code density than 68K. I haven't seen other studies where it shown that good code density. Maybe, again, compiler-biased analysis?

The stats show "68000" instead of 68k or 68020 so a few percent of code density is lost there. A few percent more could have been lost by not using -fomit-frame-pointer. If they then cherry picked the benchmarks where SH did well, these results are probably possible. I expect the 68000 program sizes are still smaller because SH does not have enough encoding bits for immediates or displacements so constants end up in the data section (where they use valuable DCache). I have seen few code density charts which include SH but I believe it to be somewhere between Thumb1 and x86. Program size can easily be larger than x86 with data included though.

The SuperH results are marketing literature and even the EISC results may be marketing literature disguised as a scientific paper. Even scientists have biases and incorrect conclusions. Good papers have good methodologies and documentation that make the information valuable anyway. Let's look at some good code density studies.

SPARC16: A new compression approach for the SPARC architecture
https://www.researchgate.net/publication/221306454_SPARC16_A_new_compression_approach_for_the_SPARC_architecture

16-bit Vs. 32-bit Instructions For Pipelined Microprocessors
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.4647&rep=rep1&type=pdf

The biases are still there.

"Since the capacity of a processor to execute instructions typically exceeds the capacity of a memory to provide them, efficiency in the encoding of instruction information can be expected to have definite hardware and/or performance costs. Such considerations for many years supported the development of CISC processors."

Good! Advantage CISC.

"CISC instructions provide relatively compact encodings of computations, but this comes at the cost of complex decoding and execution, often requiring multiple processor cycles per instruction."

Variable length encodings are usually worth the minor cost and most modern CISC instructions are single cycle.

"These drawbacks have motivated widespread adoption of the RISC paradigm, which in pure form employs only simple instructions which can be decoded easily, execute in a single machine cycle, and facilitate pipelining of the processor."

The pure form of RISC is practically dead and modern CISC hybrid pipelines can easily be pipelined.

"With the use of instruction caching and advanced compiler technology, RISC machines can provide significant performance advantages over CISC machines."

Modern CISC can use caching and advanced compiler technology too. It may even have a performance advantage from caching with reg-mem operations. Moving too much CPU complexity into the compiler results in a performance decline. Where is the RISC performance advantage?

"Moreover, architectural trends, such as parallel-issue machines, multiprocessors, and deeply pipelined machines tend to increase rather than decrease concern over instruction traffic as a performance bottleneck."

On average, the study finds 16 bit encodings decrease memory traffic by about 35% while increasing path lengths by about 15%. The longer path lengths (more instructions) are common for compressed RISC encodings as well as increased data memory traffic which was not measured in this study.

"In current implementation technology, the second-order benefits of a denser 16-bit encoding can easily exceed the path length reduction achieved with the 32-bit format."

The 68k can have path lengths (instruction counts) smaller than most 32 bit RISC formats and less memory traffic than most compressed or 16 bit RISC formats. I expect the 68k code density is still superior which the study did not compare. "RISC machines can provide significant performance advantages over CISC machines" but it leaves me wondering how.

There are so many of these compressed RISC encoding research papers which never looked beyond the code density of CISC as a target for RISC compression. How could so many scientists and researchers be so ignorant of CISC advantages? Where is the CISC research? Regardless, there is some useful information provided.

Quote:
cdimauro wrote:
OK, but those are special cases. Usually libraries require A6 for their base, because they need to access their globally-shared data. They also don't know where the library base will be allocated, and that's particularly true for libraries which stay in ROM.

Take exec.library, for example, which is the worst case: how it should work without using A6?

Libraries in ROM would be the exception. Library sections and structures loaded into memory can all be merged. There are some unanswered questions like whether it would be better to reduce the kickstart size for easier maintenance as ThoR has suggested (requires some tricks for compatibility) or enlarge it so more of the OS can be write protected.

Quote:

This all takes space (transistor) and power.

But you forgot to mention the simplified MMU, whereas the Pentium brings all its legacy and introduced other things as well.

I did forget about the MMU probably because it isn't used by default on the Amiga. Pentium kept all the segmentation support too. Lot's of baggage.

Quote:

Absolutely. As I've already stated, my ISA uses longer opcodes, so it's clear that code density will be affected when using them, albeit I've some mitigations (more features can be enabled at the same time, saving more instructions and/or registers). It's important to remark that using those features (prefixes or long instructions) you also have other benefits: instructions count and/or less memory traffic.

True. Prefixes would likely reduce instruction counts and memory traffic while reducing code density, increasing the average instruction length and adding a small amount of complexity and latency to the decoder. While instruction counts and memory traffic are generally more important performance metrics, I expect the overall performance gains to be small.

Quote:

cdimauro wrote:
I reviewed again the opcode table, looking at Motorola's 68000 manual. It took me a while, because the list is badly organized, and it's not easy to figure-out which 16-bit "slots" where used (especially which ones are left/free to be used).

Anyway, I'm decisively convinced that my idea to have 16 data registers and 8 address register, with a clear separation between the twos, is doable shuffling some instructions, and it should maintain a very very similar code density (with margins to improve it, using the new data registers).

Feel free to try.

Quote:

BTW, looking at the opcode table I think that Motorola did a big mess: it looks like a patchwork, where its engineers just reused opcode spaces as they needed, without caring and thinking about a more clear and simple organization. Decoding isn't trivial, and there's a huge waste of opcode space which would have been used much better.

It's messy in places but looks worse to humans. It is mostly the 68020 enhancements with poor encodings. My 64 bit mode cleans it up some.

Quote:

Do you still really want to expand this monster? You can keep 68K assembly-level compatibility while getting a much better opcode table, with many more 16-bit "slots" available (no, I'm not talking about adding 8 new data register: just an opcode table redesign).

For 32 bit binary compatibility, it can't be cleaned. For a 64 bit mode, there are many options to consider.

Last edited by matthey on 10-Oct-2018 at 12:48 AM.

Status: Offline

megol

Re: 68k Developement
Posted on 10-Oct-2018 16:13:10

[ #422 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@matthey
Quote:

matthey wrote:
Address registers allow addition and subtraction for up or down counters (common but needs an extra instruction in TST/CMP) or offsets between two value. ADD, SUB, CMP, TST and LEA are very common operations. In the code I looked at, opening address register sources (and even destinations) only gives a minor improvement in instruction counts and code density. For similar reasons, I would expect prefixes adding d8-d15 to give a minor improvement in instruction counts and memory traffic with worse code density in unusual cases where the divide would be a problem. The data and address register divide does not necessitate as bad of code as what some people would expect although compilers have more problem with it than humans (d8-d15 would be helpful to compilers *if* they were completely orthogonal data registers). IMO, it is not that the 68k is inherently a difficult target for compilers but rather there has been a lack of incentive to make modern 68k backends mature. I think Bebbo's GCC efforts show this and I know this is the case for the lack of effort in vbcc's backend as well.

If anything the 68k should be an easy target so I agree that lack of an incentive is the main problem.

Quote:

ADD already has An sources *and* destinations open.

add.l d0,d1
add.l a0,d0
add.l d0,a0 ; converted to adda.l with no CC
add.l a0,a1 ; converted to adda.l with no CC

There are no other variations to add here, no pun intended. ADD is orthogonal other than the way the CC is set and the way address registers sign extend sizes less than the register size.

Can only blame mental exhaustion, one day or two before writing that I actually looked at the ADD encoding. Almost scary actually. :/

I agree with you 100%.

Quote:

Toni Wilen recently added some FPU instruction logging to WinUAE in a matter of days.

http://eab.abime.net/showpost.php?p=1272424&postcount=9

He could probably add dynamic trace support and CPU performance counters (same as Apollo Core?). This would probably take us (people unfamiliar with the program) weeks to add. I would not ask Toni for this without a more serious effort.

I was familiar with the ADis disassembler (which I improved and am familiar with) so created a quick version in a matter of days which generates static stats for disassembled programs. Gunnar did make some decisions based on some of the statistics it generated and of course that is where some of my knowledge of 68k code comes from.

Frank Wille could create peephole optimizations for vasm (vbcc assembler) in a matter of days which would take me weeks. Vasm already has several ColdFire optimizations which could be turned on for a 68k target (some of which I suggested). My immediate compression could be a peephole optimization. These could give an idea of the code density improvement with the simplest ISA enhancements. As much as I have worked with Frank and as nice as he has been, I would not ask him to help with what would likely turn out to be a wasted effort. It would be worse than him asking Volker to improve the vbcc 68k backend which is a waste of time for a few thousand Amiga enthusiasts.

As it is now I think all 68k effort is basically a waste of effort. ;P

Quote:

Ok. It will probably be preferable to do base register update and then pre-decrement. It's not as easy to read but isn't too bad. I may not even keep the base register update addressing mode with added 64 bit support though.

Or you could just keep the bit reserved and decide in the future. As long as a future 68k isn't a mess of continuous updates like the x86...

Quote:

It's difficult to know what numbers to plug in to the equations. I suspect running out of registers is less common.

registers available | load/store %
24 28.21%
22 28.34%
20 28.38%
18 28.85%
16 30.22%
14 31.84%
12 34.31%
10 41.02%
8 44.45%

Source: "High-Performance Extendable Instruction Set Computing", MIPS ISA

Each register spill from running out of registers requires at least 2 load/stores. From 8 to 16 registers reduces the load/store percentage by 14.23% but from 16 to 24 registers by only 2.01%. The overhead cost of RISC running out of registers makes it pretty easy to see that register spills are common for 8 registers and already uncommon at 16 registers (I would guess less than 10% out of register percentage and probably less than 5% with 16 registers). This is older and "various benchmark" code but probably doesn't vary too much from typical code today. CISC ISAs are generally register misers compared to RISC ISAs which may be partially offset by the 68k register split. In any case, the 68k out of register overhead is likely much lower (I would expect the load/store percentage increase when out of registers to be half of RISC). From various stats I have seen of 68k code, the 68k has low memory traffic and is mostly consistent whether optimizing for performance or size (unlike the x86/x86_64). Your own equations make 16 registers look adequate for common code as well.

Yes but those equations are very skewed towards the CISC case. Unsure if the logic/math check out also, especially with the (extreme) blunder about the ADD instruction above. :(
I just wanted to illustrate one shouldn't take something as obvious without trying to model it.

Quote:

Modern mid performance RISC CPUs have longer pipelines comparable with the 68060 so there is little branch performance difference. Yes, simple RISC CPUs can be smaller than CISC but many of them will not have 32 registers because of limited resources.

But then you are comparing processors designed with different technology in different times with different design choices and different performance. Pipeline length is longer in the mid performance RISC not because they are RISC but because they are expected to clock much higher than the 68060. The branch predictors have therefore grown in size and complexity with the result that a modern processor have a lower branch mispredict ratio than early processors and also lower effective branch mispredict penalty.

So one have either to look at contemporary designs or try to design and compare a modern 68k version.

Quote:

68k 16x32
x86 8x32
x86_64 16x64
z/Architecture 16x64
68k_64 16x64

Modern CISC integer register files look to be more medium sized to me. Specific CPU designs use more registers internally. Other units use larger register files.

I meant microarchitectural registers.
x86 (Intel Core/Yonah): >40 "registers" (Reorder buffer) + 8 (+ some other). 48 registers minimum.
AMD64 (AMD Ryzen): 168 registers.
z14: ??, 16 architectural - upper 32 bits can sometimes be used as a separate register.
68k64: 32 architectural via prefix, probably a minimum of 64+64 physical registers.

This is to show that the physical costs of large register files are there even for CISC designs if they are high performance. A high performance in order design may also use more physical registers than architecturally visible.

Quote:

The 68060 pipeline is generally considered to be an 8 stage design (worst case branch mis-prediction is 8 cycles). The Alpha 21164 has a 7 stage integer and 10 stage FPU pipeline. 68060 instructions which hit in the cache are usually single cycle. The Alpha 21164 is an extremely aggressive 4 wide superscalar high performance design with a dual ported L1 DCache compared to the more balanced (up to 3 instruction issue) superscalar 68060 design. Some people thought it was unfair to compare the 68060 to the more aggressive Pentium design but they are closer than the Pentium and Alpha. The energy requirements give a hint at the aggressiveness of the design.

Alpha 21164@300MHz 3.3V, .5um, 9.3 million transistors, 51W max

Pentium@75MHz 80502, 3.3V, 0.6um, 3.2 million transistors, 9.5W max
68060@75MHz 3.3V, 0.6um, 2.5 million transistors, ~5.5W max *1
PPC 601@75MHz 3.3V, 0.6um, 2.8 million transistors, ~7.5W max *2

There could be 3 68060 CPUs operating in parallel using less transistors or 9 68060 CPUs in parallel for less than the energy usage of the Alpha 21164 and that is with the Alpha using a smaller die size. Too bad Motorola was not designing multi-core CPUs back then.

The 21164 is very aggressive. However I wanted to show that a contemporary RISC on a comparable manufacturing process didn't just have a similar load to use latency in cycles, but also that this didn't come at the expense of the frequency. That's all.
The 68060 could of course been made more aggressive but I doubt it would be near the 21164 frequencies even with a higher thermal, transistor, and design budget. The DEC Alpha architecture was designed to clock high from the beginning.

IIRC the 68060 75MHz used a 0.42um process?

Quote:

Does single cycle dependency/hazard checking become a problem with wider in-order superscalar issue?

A one-wide (scalar) RISC pipeline need at most checking two input sources and even that may be optional, if every instruction is one stage that dependency check isn't needed as the result will always be available or the pipeline will be stalled waiting for load data.
The simplest two wide superscalar would have to check dependencies for the first pipeline (2 checks) and the second 4 (the same as the first + checking dependencies on first instruction). A three wide design will need to check 2+4+6 dependencies etc.

In a practical design this can be worse than it sounds as dependencies have to be checked before execution (and probably even before routing instructions to pipelines) and requires at least partially decoded instructions. CISC and some "RISC" (read: ARM) have condition codes as additional inputs/outputs.

Quote:

I found some good uses for the unused 68k 6 bit EA encodings. It allows the compressed immediates as well as a fast (d32,pc) addressing mode which is very useful for a 64 bit ISA. I think 16 registers will be fine. I would rather focus on good 64 bit support which, IMO, is where many other 64 bit ISAs are lacking.

I suggest:
#imm.w
#imm.q
(d32,PC) or possibly something else, 64 bit absolute?

Status: Offline

matthey

Re: 68k Developement
Posted on 11-Oct-2018 1:11:01

[ #423 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2015
From: Kansas

Quote:

megol wrote:
Can only blame mental exhaustion, one day or two before writing that I actually looked at the ADD encoding. Almost scary actually. :/

I agree with you 100%.

It happens to all of us. I know you know about ADD. Try a little sleep.

Quote:

As it is now I think all 68k effort is basically a waste of effort. ;P

That was the conclusion I finally came to after my efforts over the years to bring the 68k and Amiga back.

Quote:

Or you could just keep the bit reserved and decide in the future. As long as a future 68k isn't a mess of continuous updates like the x86...

I certainly hope an enhanced 68k would not grow fat like x86/x86_64. I would prefer to see more standardization and less change in an ISA.

Quote:

I meant microarchitectural registers.
x86 (Intel Core/Yonah): >40 "registers" (Reorder buffer) + 8 (+ some other). 48 registers minimum.
AMD64 (AMD Ryzen): 168 registers.
z14: ??, 16 architectural - upper 32 bits can sometimes be used as a separate register.
68k64: 32 architectural via prefix, probably a minimum of 64+64 physical registers.

This is to show that the physical costs of large register files are there even for CISC designs if they are high performance. A high performance in order design may also use more physical registers than architecturally visible.

Some of the internal registers have a higher cost than the ISA visible registers but they are not necessary on lower performance CPU designs. Large register files can become undesirable on mid to low performance and embedded CPUs. Look no further than the Tabor CPU which consolidated register files and it is closer to mid performance.

Quote:

The 21164 is very aggressive. However I wanted to show that a contemporary RISC on a comparable manufacturing process didn't just have a similar load to use latency in cycles, but also that this didn't come at the expense of the frequency. That's all.
The 68060 could of course been made more aggressive but I doubt it would be near the 21164 frequencies even with a higher thermal, transistor, and design budget. The DEC Alpha architecture was designed to clock high from the beginning.

Neither the 68060 nor the Pentium could have achieved Alpha 21164 frequencies at that time. The inefficiency of high clock frequencies, the spartan ISA (which didn't originally support byte sizes) and the high cost to achieve such extreme frequencies ended up being the downfall of not only the Alpha CPU line but also DEC. Alpha CPUs were the fastest for a time and rocked the world of CPU designs, but some of the best CPU designers of the time made some big mistakes.

Quote:

IIRC the 68060 75MHz used a 0.42um process?

The original die size for the 68060 was 0.6um but there was a die shrink with rev 6 to 0.42um. I'm not so sure there was a full 68060 rated at 75MHz although the rev 6 could easily have been rated that high (my 68060@75MHz runs cool and a stable 90MHz to 105MHz is common for overclockers).

Quote:

I suggest:
#imm.w
#imm.q
(d32,PC) or possibly something else, 64 bit absolute?

#imm.w is excellent. Something like 75-80% of integer immediates fit in 16 bits.

#imm.q is probably unnecessary with a 64 bit 68k ISA as the already existing #imm uses the operation size and would need no sign extension. It would be possible to sign extend all 32 bit operations like x86_64. This has some performance advantages like avoiding partial register writes but loses the advantage of 32 bit sizes when wanted and this is inconsistent with existing ISA behavior for data registers. The 68k address registers would sign extend like x86_64 (actually more like RISC as all sizes less than the register width are size extended) which is consistent with the 68k ISA. We can see that x86_64 is likely less than efficient with immediates from cdimauro's data.

Size Count
IMM32 53862
IMM16 8456
IMM64 2345

IMM16 should be more common than IMM32 but IMM16 is not sign extended with x86_64. I believe a sign extended #imm.w would allow many (perhaps most) of the immediates to move from IMM32 to IMM16 saving 2 bytes of code each.

(d32,PC) is excellent as fast and short PC relative is necessary since absolute addressing is now 8 bytes. This mode allows a 4 GiB PC relative range which should be adequate for even large programs.

A new absolute addressing mode is an option or perhaps replacing the word sized absolute addressing (xxx).W in 64 bit mode. Existing code could easily be converted to (xxx).L or (xxx).Q. The Amiga should be able to do away with most absolute addresses. PC relative and PIC code are much better, especially for 64 bit.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 11-Oct-2018 9:45:33

[ #424 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey Quote:
matthey wrote:
Quote:
cdimauro wrote:
That's impressive. I wouldn't have expected a so high frequency for "address registers" instructions.

Take a look at the "Instruction Mix" for SPEC INT92 on page 55 of the following pdf.

https://www.cl.cam.ac.uk/teaching/0405/CompArch/mynotes.pdf

Although the chart is for RISC, loads and stores are either a MOVE or part of a reg-mem instruction. Of the top 12 most frequent instructions, only OR, some shifts, AND and MUL/DIV require a data register.

Ah, ok, now it's clear. I thought that what you reported was some study/analysis that you specifically made on 68K executables, and I was interested on the weight of address-registers instructions.
Quote:
These notes have some excellent information. There is a small code density comparison chart on page 52 which includes the 68k. Despite the handicap of a statically linked libc, the 68k had the smallest programs in 3 out of 3 benchmarks despite the x86 having a better code density in one.

I see, and that's expected: 68K always shows a very good code density.

It's also very interesting for me as well, because coincidentally I've collected statistics about GNU's cc1.exe, cc1plus.exe, and lto1.exe executables (which are all 32-bit, with MinGW), and the 32-bit version of my ISA shows around -15% instructions size (from 900K to 1.23M instructions disassembled. 1.23M for cc1plus) with these exes.
Quote:
Quote:
I breafly took a look at the BJX1 page, but I find it a quite complex ISA. I wonder how easy it can be to implement it.

It is not so simple anymore while some flaws of SuperH are difficult to overcome. The 68k would be minimally more complex to implement and easier to enhance, IMO.

The decoder is painful, unfortunately. And double indirect memory modes create problems, especially if both used on MOVE Mem,Mem instructions (which is the worst case).
Quote:
The SuperH results are marketing literature and even the EISC results may be marketing literature disguised as a scientific paper. Even scientists have biases and incorrect conclusions. Good papers have good methodologies and documentation that make the information valuable anyway. Let's look at some good code density studies.

SPARC16: A new compression approach for the SPARC architecture
https://www.researchgate.net/publication/221306454_SPARC16_A_new_compression_approach_for_the_SPARC_architecture

16-bit Vs. 32-bit Instructions For Pipelined Microprocessors
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.4647&rep=rep1&type=pdf

The biases are still there.

"Since the capacity of a processor to execute instructions typically exceeds the capacity of a memory to provide them, efficiency in the encoding of instruction information can be expected to have definite hardware and/or performance costs. Such considerations for many years supported the development of CISC processors."

Good! Advantage CISC.

"CISC instructions provide relatively compact encodings of computations, but this comes at the cost of complex decoding and execution, often requiring multiple processor cycles per instruction."

Variable length encodings are usually worth the minor cost and most modern CISC instructions are single cycle.

"These drawbacks have motivated widespread adoption of the RISC paradigm, which in pure form employs only simple instructions which can be decoded easily, execute in a single machine cycle, and facilitate pipelining of the processor."

The pure form of RISC is practically dead and modern CISC hybrid pipelines can easily be pipelined.

"With the use of instruction caching and advanced compiler technology, RISC machines can provide significant performance advantages over CISC machines."

Modern CISC can use caching and advanced compiler technology too. It may even have a performance advantage from caching with reg-mem operations. Moving too much CPU complexity into the compiler results in a performance decline. Where is the RISC performance advantage?

"Moreover, architectural trends, such as parallel-issue machines, multiprocessors, and deeply pipelined machines tend to increase rather than decrease concern over instruction traffic as a performance bottleneck."

On average, the study finds 16 bit encodings decrease memory traffic by about 35% while increasing path lengths by about 15%. The longer path lengths (more instructions) are common for compressed RISC encodings as well as increased data memory traffic which was not measured in this study.

"In current implementation technology, the second-order benefits of a denser 16-bit encoding can easily exceed the path length reduction achieved with the 32-bit format."

The 68k can have path lengths (instruction counts) smaller than most 32 bit RISC formats and less memory traffic than most compressed or 16 bit RISC formats. I expect the 68k code density is still superior which the study did not compare. "RISC machines can provide significant performance advantages over CISC machines" but it leaves me wondering how.

There are so many of these compressed RISC encoding research papers which never looked beyond the code density of CISC as a target for RISC compression. How could so many scientists and researchers be so ignorant of CISC advantages? Where is the CISC research? Regardless, there is some useful information provided.

I agree with you, and I've already said before: IMO CISC research was "banned" in favor of RISCs. RISCs, which look more CISCs...

Interesting papers, anyway, especially the third one (16-bit Vs. 32-bit Instructions) because there's a nice comparison of the DLX ISA with some variants which a combination of only 16 registers and only 2 operands. It shows that using 32 registers and 3 operands give (while keeping the same 32-bit opcode format) gives advantage on both code density and path length (especially on this one). Regarding registers usage, there are some cases where the 16-regs + 3-op version gave better code density results compared to the 32-regs + 3-op, but overall the latter is doing better.

This is a good indication of possible benefits coming from a CISC design which implements those features/instructions.
Quote:
Quote:
cdimauro wrote:
OK, but those are special cases. Usually libraries require A6 for their base, because they need to access their globally-shared data. They also don't know where the library base will be allocated, and that's particularly true for libraries which stay in ROM.

Take exec.library, for example, which is the worst case: how it should work without using A6?

Libraries in ROM would be the exception. Library sections and structures loaded into memory can all be merged. There are some unanswered questions like whether it would be better to reduce the kickstart size for easier maintenance as ThoR has suggested (requires some tricks for compatibility) or enlarge it so more of the OS can be write protected.

OK, now it's clear. But this will be a new library model to implement.
Quote:
Quote:
Absolutely. As I've already stated, my ISA uses longer opcodes, so it's clear that code density will be affected when using them, albeit I've some mitigations (more features can be enabled at the same time, saving more instructions and/or registers). It's important to remark that using those features (prefixes or long instructions) you also have other benefits: instructions count and/or less memory traffic.

True. Prefixes would likely reduce instruction counts and memory traffic while reducing code density, increasing the average instruction length and adding a small amount of complexity and latency to the decoder. While instruction counts and memory traffic are generally more important performance metrics, I expect the overall performance gains to be small.

It depends on the specific context (can be better inside loops, for example). Applying a simple peephole optimizer to make use of a few features of my ISA, I've observed an instruction count reduction of 5% circa. But more reduction might come from using 32 registers, 3 operands, and such longer opcodes (which enable Sign/zero extension, flags modification, cache evictions, etc.).
Quote:
Quote:
cdimauro wrote:
I reviewed again the opcode table, looking at Motorola's 68000 manual. It took me a while, because the list is badly organized, and it's not easy to figure-out which 16-bit "slots" where used (especially which ones are left/free to be used).

Anyway, I'm decisively convinced that my idea to have 16 data registers and 8 address register, with a clear separation between the twos, is doable shuffling some instructions, and it should maintain a very very similar code density (with margins to improve it, using the new data registers).

Feel free to try.

That's my plan, but first I've to add the prologue/epilogue optimization to the script which I've created to collect the statistics for my ISA.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 12-Oct-2018 7:23:46

[ #425 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey Quote:

matthey wrote: Quote:

megol wrote:
As it is now I think all 68k effort is basically a waste of effort. ;P

That was the conclusion I finally came to after my efforts over the years to bring the 68k and Amiga back.

Who knows: maybe it's bad luck 'til now. I still believe that something can change in the CPU/ISA panorama, otherwise I would never invested so much time on my ISA.
Quote:
Quote:
I meant microarchitectural registers.
x86 (Intel Core/Yonah): >40 "registers" (Reorder buffer) + 8 (+ some other). 48 registers minimum.
AMD64 (AMD Ryzen): 168 registers.
z14: ??, 16 architectural - upper 32 bits can sometimes be used as a separate register.
68k64: 32 architectural via prefix, probably a minimum of 64+64 physical registers.

This is to show that the physical costs of large register files are there even for CISC designs if they are high performance. A high performance in order design may also use more physical registers than architecturally visible.

Some of the internal registers have a higher cost than the ISA visible registers but they are not necessary on lower performance CPU designs. Large register files can become undesirable on mid to low performance and embedded CPUs. Look no further than the Tabor CPU which consolidated register files and it is closer to mid performance.

Some embedded CPUs have 32 registers. BA21 is one examble, but another notable one is Atmel AVR.
Quote:
Quote:
IIRC the 68060 75MHz used a 0.42um process?

The original die size for the 68060 was 0.6um but there was a die shrink with rev 6 to 0.42um. I'm not so sure there was a full 68060 rated at 75MHz although the rev 6 could easily have been rated that high (my 68060@75MHz runs cool and a stable 90MHz to 105MHz is common for overclockers).

According to Wikipedia https://en.wikipedia.org/wiki/Motorola_68060
"The 68060 was introduced at 50 MHz on Motorola's 0.6 µm manufacturing process. A few years later it was shrunk to 0.42 µm and clock speed raised to 66 MHz and 75 MHz."

That can explain why it's reported a so low power consumption (~5.5W max) compared to Pentium and PowerPC (which were using 0.6µm process).

Other data reported there:

"~88 MIPS @ 66 MHz
~110 MIPS @ 75 MHz
~36 MFlops @ 66 MHz"

However the 68060 can only issue one FPU instruction per cycle, so I think the last data should be 33MFlops @ 66 Mhz.

Also:

"However, a significant difference is that the 68060 FPU is not pipelined and is therefore up to three times slower than the Pentium in floating point applications. In contrast to that, integer multiplications and bit shifting instructions are significantly faster on the 68060.
[...]
Against the Pentium, the 68060 can perform better on mixed code; Pentium's decoder cannot issue an FP instruction every opportunity and hence the FPU is not superscalar as the ALUs were. If the 68060's non-pipelined FPU can accept an instruction, it can be issued one by the decoder. This means that optimizing for the 68060 is easier: no rules prevent FP instructions from being issued whenever was convenient for the programmer other than well understood instruction latencies. However, with properly optimized and scheduled code, the Pentium's FPU is capable of double the clock for clock throughput of the 68060's FPU."

But the Pentium page regarding the new features introduced is very interesting as well: https://en.wikipedia.org/wiki/P5_(microarchitecture)#Major_improvements_over_the_80486_microarchitecture
In particular:

"64-bit external databus doubles the amount of information possible to read or write on each memory access and therefore allows the Pentium to load its code cache faster than the 80486; it also allows faster access and storage of 64-bit and 80-bit x87 FPU data.

Much faster floating point unit. Some instructions showed an enormous improvement, most notably FMUL, with up to 15 times higher throughput than in the 80486 FPU. The Pentium is also able to execute a FXCH ST(x) instruction in parallel with an ordinary (arithmetical or load/store) FPU instruction.

Four-input address-adders enables the Pentium to further reduce the address calculation latency compared to the 80486. The Pentium can calculate full addressing modes with segment-base + base-register + scaled register + immediate offset in a single cycle; the 486 has a three-input address-adder only, and must therefore divide such calculations between two cycles."

Other features sub-section is also interesting.

So, a lot of stuff introduced with the Pentium.
Quote:
Quote:
I suggest:
#imm.w
#imm.q
(d32,PC) or possibly something else, 64 bit absolute?

#imm.w is excellent. Something like 75-80% of integer immediates fit in 16 bits.

#imm.q is probably unnecessary with a 64 bit 68k ISA as the already existing #imm uses the operation size and would need no sign extension.

I agree: it's not needed.
Quote:
We can see that x86_64 is likely less than efficient with immediates from cdimauro's data.

Size Count
IMM32 53862
IMM16 8456
IMM64 2345

IMM16 should be more common than IMM32 but IMM16 is not sign extended with x86_64. I believe a sign extended #imm.w would allow many (perhaps most) of the immediates to move from IMM32 to IMM16 saving 2 bytes of code each.

Consider another thing: IMM16 covers only a small range of integer data. I haven't put all IMM statistics, but the vast majority is already covered by immediates < 16-bits, so not covered by IMM16.

Whereas IMM32 collects all integer ranges which are < -32768 and >32767.
Quote:
(d32,PC) is excellent as fast and short PC relative is necessary since absolute addressing is now 8 bytes. This mode allows a 4 GiB PC relative range which should be adequate for even large programs.

*
Quote:
A new absolute addressing mode is an option or perhaps replacing the word sized absolute addressing (xxx).W in 64 bit mode. Existing code could easily be converted to (xxx).L or (xxx).Q. The Amiga should be able to do away with most absolute addresses. PC relative and PIC code are much better, especially for 64 bit.

Absolutely (!).

Status: Offline

megol

Re: 68k Developement
Posted on 12-Oct-2018 21:26:54

[ #426 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@matthey
@cdimauro
The reason for including the #imm.q mode would be to avoid prefixes ever changing instruction length, also allowing quad operations to use the #imm(.l) simply by using the normal #imm encoding. Quad is encoded the same as a long operation with a prefix changing the operation size, to the main decoder the default size is long.

One could possibly use a #imm.q to automatically change to quad size without an additional prefix, don't know how useful that would be.

Don't know if this makes much sense, logically it's a waste of space as 64 bit data is rare.

Status: Offline

Hypex

Re: 68k Developement
Posted on 13-Oct-2018 15:08:42

[ #427 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@cdimauro

Quote:
64-bit (long) mode still uses segmentation.

Oh no!

Quote:
It's not really important. Do you care about nibble alignment with bitplanes? No. And you don't have to care about strange packed depths, like using 3 bits per CLUT entry.

I don't care about nibble alignment with bitplanes because I don't need to think about bit alignments. All the pixel bits are split up into the bitplanes.

With chunky it should be aligned to the nearest multiple of two for management. Otherwise a packed 3 bit mode would surely use less memory than being forced to a 4 bit alignment. But it would be a pain in the arse to deal with. Shifting and masking.

Quote:
No, even interleaved bitmaps are very different and absolutely not as efficient as packed modes using the same depth.

I did about it. With a 320 width in mind. An 8-bit chunky map will use 320 bytes per line in a row. An 8-bit interleaved bitmap will also use 320 bytes per line, for the whole row. Formatted differently. Mind you, I'm just thinking of space here, and having all the bytes for each row together in one block.

Quote:
No need to dispute. The dual playfield mode relies on bitplanes because on Amiga you only have them and you can use nothing else.

Somehow saimo managed to program two 256 colour byte planes on the Amiga and be able to blend them together with rotation.

Quote:
But to do exactly the same (and much better IMO) with packed graphic, you just need 2 packed "planes" and define the proper depth for each one. It's even more flexible, because you might define the background plane with more bits/depth, and the foreground one less, just to give an example.

There's some other tricks Amiga bitplanes do like splitting the scroll offsets and other trickery to get more layers on screen. I don't recall it being a major problem for most games. And bitplanes were still common at the time. Until the 90s.

Quote:
The problem is the we, as Amiga coders, have grown with the bitplane model in mind, and only seen the packed/chunky graphic with a single plane with only power-of-2 depths. If you remove this preconception, you'll see the benefits packed graphic in a more general way.

At the time, I recall there was CGA and EGA, which by comparison looked lame. So yes, in my mind there is a preconception to think about those video adaptors, when thinking chunky.

The Amiga did get chunky in RTG cards. Usually at 8-bit depth. But no fancy dual playfields or sprites or copper lists. no wonder it didn't take off.

Quote:
Plain simple and ultra efficient compared to the ENORMOUS waste of reusing the mask for EVERY bitplane on the Amiga when you have to cookie-cut graphic.

Usualy reusing something is an efficient means of working.

Quote:
Whatever solution you use, either CPU or Blitter, you'll end up wasting space and bandwidth. Do your math.

I recall one programmer said bitplanes were the best of all, since it could provide the benefit of both worlds. And could also enable some form of chunky. I don't recall him staying around to prove his point. But I was interested in how he would do that. Still, bitplanes provide a challenge and some creative thinking, against doing things the easy way which may lead to a lazy brain.

Status: Offline

Hypex

Re: 68k Developement
Posted on 13-Oct-2018 15:18:00

[ #428 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@wawa

Quote:
i have heard that mantra, lets be blunt, many times before. if you really want to artificially restrict yourself trying to use some kind of amiga related system, just for the sake of it, while you would have been better off just using a pc standing next to it, to fulfil your task, so be it.

Then at this point there is no point in using an OS4 or NG machine at all. Time to give up. But a PC doesn't run my AmigaDOS scripts or Amiga binaries directly do it's a bit pointless for that.

Quote:
as example, i have heard many times, how problematic it is to compile odyssey on an os4 system, because of cmake, other tools, because it takes long, because this or that. now, i can build odyssey for whatever target im using on linux using its multi threading capabilities. why would i even insist on doing it on non-smp aros, considering the size and complexity of the project?

I don't know, I never suggested compiling Odyssey on an AROS system, even if it has a fast x86. Is that some kind of straw man argument? LOL.

Joking aside. What I am thinking of is the casual user, not an advanced developer. Those not into programming games, but playing them. And using other Amiga software.

Status: Offline

Hypex

Re: 68k Developement
Posted on 13-Oct-2018 15:29:52

[ #429 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@matthey

Quote:
Data registers come first because they are mode=0 and address registers are mode=1 in the EA encoding. Taking the whole 6 bit EA gives numbers from 0-15 for registers r0-r15.

Okay that makes sense. But it still requires intimate knowledge of the encoding. Somewhat.

Quote:
Right. The Tabor CPU would likely not have been as bad for a one and done embedded device. It is still rude to expect major support changes from compiler and tools developers when there is such a small advantage to the hardware changes. It seems like a "cheapened" CPU that software guys are expected to make "acceptable".

I agree with you there. You bring up a good point. This makes it hard to compile programs since generic user mode code should be compilable. But now it will need specific code to be optimised. I suppose this an old problem we are used to with 68K/020/040 versions. But now there will be vectored, non-vectored and SPE. If not more. I'm not aware of the SDK generating all variants and organising them into neat little directories with an install script.

Quote:
The paper cdimauro linked gives the details (I think I originally gave him the link). See the text that is referring to figure 6. Legacy code compatibility is retained.

Been a little while since I read it but looks like they managed that feature well.

Status: Offline

Hypex

Re: 68k Developement
Posted on 13-Oct-2018 15:43:26

[ #430 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@matthey

Quote:
The prefix is probably the cleanest way to add 8 more data registers to the 68k without re-encoding but maybe re-encoding would be the better choice at that point.

The problem with prefixes on 68K is that they must have 16-bit alignment since that is minimum instruction size. It also doesn't look like it's designed for such a mechanism since it has whole instructions for performing certain tasks. It works on x86 because it is byte based and easily fits into the ISA design. I see it is also used for some neat things like setting up loop counters.

Status: Offline

Hypex

Re: 68k Developement
Posted on 13-Oct-2018 15:49:50

[ #431 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@matthey

Quote:
He could probably add dynamic trace support and CPU performance counters (same as Apollo Core?). This would probably take us (people unfamiliar with the program) weeks to add. I would not ask Toni for this without a more serious effort.

I wonder when Vampire emulation will be added to UAE?

Status: Offline

cdimauro

Re: 68k Developement
Posted on 13-Oct-2018 18:00:58

[ #432 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@megol Quote:
megol wrote:
@matthey
@cdimauro
The reason for including the #imm.q mode would be to avoid prefixes ever changing instruction length, also allowing quad operations to use the #imm(.l) simply by using the normal #imm encoding. Quad is encoded the same as a long operation with a prefix changing the operation size, to the main decoder the default size is long.

One could possibly use a #imm.q to automatically change to quad size without an additional prefix, don't know how useful that would be.

Don't know if this makes much sense, logically it's a waste of space as 64 bit data is rare.

That's the point: they are quite rare. It's better to use this encoding for more important EA modes.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 13-Oct-2018 18:23:01

[ #433 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hypex Quote:
Hypex wrote:
@cdimauro
Quote:
It's not really important. Do you care about nibble alignment with bitplanes? No. And you don't have to care about strange packed depths, like using 3 bits per CLUT entry.

I don't care about nibble alignment with bitplanes because I don't need to think about bit alignments. All the pixel bits are split up into the bitplanes.

That's wrong: alignment is VERY important (and expensive) using bitplanes. On OCS/ECS machine, a bitplane must be 16-bit aligned, whereas on AGA it must be 32 or (more common) 64-bit aligned. The display logic imposes such limits (and the Blitter as well: 16-bit for each operation).

Packed/chunky modes have much lower alignment restrictions (taking into account the depth).
Quote:
With chunky it should be aligned to the nearest multiple of two for management. Otherwise a packed 3 bit mode would surely use less memory than being forced to a 4 bit alignment. But it would be a pain in the arse to deal with. Shifting and masking.

That's what you also do with bitplanes: shifting and masking using the most common operations.

So, what's the point on not accepting "weird" packed modes, like 3-bit depth? It's only due to mental schema which imposes to see at packed modes only being power-of-two (BTW; 2 and 4-bit packed modes require shifting & masking as well, and they p-o-t).
Quote:
Quote:
No, even interleaved bitmaps are very different and absolutely not as efficient as packed modes using the same depth.

I did about it. With a 320 width in mind. An 8-bit chunky map will use 320 bytes per line in a row. An 8-bit interleaved bitmap will also use 320 bytes per line, for the whole row. Formatted differently. Mind you, I'm just thinking of space here, and having all the bytes for each row together in one block.

Now add one pixel horizontally, and do again the calculations. Surprise, surprise: bitplanes are the worst in terms of wasted space, especially for an AGA screen.
Quote:
Quote:
No need to dispute. The dual playfield mode relies on bitplanes because on Amiga you only have them and you can use nothing else.

Somehow saimo managed to program two 256 colour byte planes on the Amiga and be able to blend them together with rotation.

I don't think that such operations require / are based on bitplanes. Maybe they use packed graphic, and then require a chunky-to-bitplane conversion (which isn't required on packed graphic, of course).
Quote:
Quote:
But to do exactly the same (and much better IMO) with packed graphic, you just need 2 packed "planes" and define the proper depth for each one. It's even more flexible, because you might define the background plane with more bits/depth, and the foreground one less, just to give an example.

There's some other tricks Amiga bitplanes do like splitting the scroll offsets and other trickery to get more layers on screen. I don't recall it being a major problem for most games. And bitplanes were still common at the time. Until the 90s.

Yes, but it doesn't mean that packed graphic has no advantages as well. You can also make some scroll operation with packed graphic, more or less the same as with the bitplanes on Amigas (even better/easier with 8, 16, 24 and 32-bit packed graphic: you just need to changed the pointer to scroll the are).
Quote:
Quote:
The problem is the we, as Amiga coders, have grown with the bitplane model in mind, and only seen the packed/chunky graphic with a single plane with only power-of-2 depths. If you remove this preconception, you'll see the benefits packed graphic in a more general way.

At the time, I recall there was CGA and EGA, which by comparison looked lame. So yes, in my mind there is a preconception to think about those video adaptors, when thinking chunky.

I don't remember the CGA, but EGA used bitplanes, like Amigas...
Quote:
The Amiga did get chunky in RTG cards. Usually at 8-bit depth. But no fancy dual playfields or sprites or copper lists. no wonder it didn't take off.

That's simply because RTG on Amigas are alien technologies.
Quote:
Quote:
Plain simple and ultra efficient compared to the ENORMOUS waste of reusing the mask for EVERY bitplane on the Amiga when you have to cookie-cut graphic.

Usualy reusing something is an efficient means of working.

In this case it's exactly the opposite: you're wasting A LOT of (chip) memory bandwidth by reloading the same mask for EVERY bitplane where you need to apply it.
Quote:
Quote:
Whatever solution you use, either CPU or Blitter, you'll end up wasting space and bandwidth. Do your math.

I recall one programmer said bitplanes were the best of all, since it could provide the benefit of both worlds.

I don't trust him: I trust math, which clearly proves that packed graphic is superior to bitplanes, except if you need to access specific bits (which on Amiga means single bitplanes).
Quote:
And could also enable some form of chunky.

No chunky is possible with bitplanes.
Quote:
I don't recall him staying around to prove his point. But I was interested in how he would do that.

Don't worry he can prove nothing. As I said before, I've mathematically proved it on the amigacoding.de forum, and even Gunnar (which was advocating bitplanes) was surprised.
Quote:
Still, bitplanes provide a challenge and some creative thinking, against doing things the easy way which may lead to a lazy brain.

Creative people can have challenges with packed graphic as well.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 13-Oct-2018 18:26:48

[ #434 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hypex Quote:
Hypex wrote:
@wawa

Quote:
i have heard that mantra, lets be blunt, many times before. if you really want to artificially restrict yourself trying to use some kind of amiga related system, just for the sake of it, while you would have been better off just using a pc standing next to it, to fulfil your task, so be it.

Then at this point there is no point in using an OS4 or NG machine at all. Time to give up. But a PC doesn't run my AmigaDOS scripts or Amiga binaries directly do it's a bit pointless for that.

It's also possible on PCs as well, with a proper software ("virtualizer").

@Hypex Quote:
Hypex wrote:
@matthey

Quote:
He could probably add dynamic trace support and CPU performance counters (same as Apollo Core?). This would probably take us (people unfamiliar with the program) weeks to add. I would not ask Toni for this without a more serious effort.

I wonder when Vampire emulation will be added to UAE?

Ask Toni Wilen.

Status: Offline

megol

Re: 68k Developement
Posted on 13-Oct-2018 18:30:20

[ #435 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@cdimauro
Quote:

cdimauro wrote:
That's the point: they are quite rare. It's better to use this encoding for more important EA modes.

Yes. But is that 68k extended to 64 bit? Philosophy. :)

(Not feeling well so while I read this thread don't expect long replies)

Status: Offline

JimIgou

Re: 68k Developement
Posted on 13-Oct-2018 18:42:29

[ #436 ]

Regular Member

Joined: 30-May-2018
Posts: 114
From: Unknown

@matthey

Quote:
That was the conclusion I finally came to after my efforts over the years to bring the 68k and Amiga back.

As a devoted MorphOS user, I'd like to see us focus on Power 9 (and its successors) rather than move to X64 once PPCs quietly die.

But I don't agree that the 68K efforts are a complete waste, as they will allow us to have full backward compatibility without emulation (JIT or otherwise).

And more powerful 68K cores would enable backporting of OS4 and MorphOS, finally unifying our efforts.

Oh, and btw, even if none of this succeeds, I anticipate being able to run OS3.1-3.9, OS4, and MorphOS all via QEMU on a Raptor Talos system (concurrently) in the near future. So I feel I'm staying true to my roots.

I should be able to run X64 OS' and apps as well in a similar manner as well.

Hypervisor enabled, bi-endian processors are too cool.

Finally, as to the previously mentioned Super H, I thought that was a great processor. And the re-implemented H2 core (BSD licensed as J2) looks promising. As H2+ and H4 patents expire, this line of open cores will expand and all appear to have advantages over the 68K.

@megol

Quote:
Yes. But is that 68k extended to 64 bit?

Is 64 bit absolutely essential? I still have some 32 bit OS' running, even when they are on 64 bit processors (IE - 32 bit Windows on an i7, MorphOS on a PowerMac G5).

Our current problem isn't 64 bit capability, its that our legacy is 31 bit addressing, not 32, limiting us to 2 GB instead of 4GB.

Just by re-working the software and OS' we can double our memory capability, all on the same processors.

Last edited by JimIgou on 13-Oct-2018 at 06:48 PM.

Status: Offline

hth313

Re: 68k Developement
Posted on 14-Oct-2018 7:57:05

[ #437 ]

Regular Member

Joined: 29-May-2018
Posts: 159
From: Delta, Canada

@JimIgou

Quote:

JimIgou wrote:
Is 64 bit absolutely essential? I still have some 32 bit OS' running, even when they are on 64 bit processors (IE - 32 bit Windows on an i7, MorphOS on a PowerMac G5).

Our current problem isn't 64 bit capability, its that our legacy is 31 bit addressing, not 32, limiting us to 2 GB instead of 4GB.

Just by re-working the software and OS' we can double our memory capability, all on the same processors.

For me it is essential. Windows 32-bit has a similar limitation of 2 GB, but that can be linked for 3 GB, which was something we did at my previous job as customers were running out of memory. It was a cheap short term solution. I believe Windows also has this limitation per process, not for all processes combined like AmigaOS?

Linking libclang (a C/C++ compiler front end and more written in C++) consumes 6 GB on my 64-bit Linux. I have another tool that can currently need 3.5GB (it may very well grow in the future), but this is of course on a 64-bit OS, which means it also consumes a bit more memory compared to a 32-bit OS.

One of my colleagues could not understand why I have 80 tabs open in my web browser (I use them). I have not checked much memory it takes, but a RPi can handle 3-4 tabs running with 1 GB also running a somewhat bare bones Linux in that memory.

That is just me, a fairly normal developer. Then we there are people doing in memory databases, try telling them that 32-bit is enough...

Of course, some people can live with a 32-bit OS, but there are many that cannot (or are coming very close to running out of 32-bit space). Whether that is absolutely essential can be debated I guess. I do not think that memory consumption will go down in the coming years. Yes, it is ridiculous, but we are also doing things with the computers that were not really possible a few years ago.

Status: Offline

OneTimer1

Re: 68k Developement
Posted on 14-Oct-2018 13:06:54

[ #438 ]

Cult Member

Joined: 3-Aug-2015
Posts: 983
From: Unknown

Quote:

Hypex wrote:

I wonder when Vampire emulation will be added to UAE?

No one wants a downgrade for UAE, Vampire has no MMU and only a faulty FPU.

Status: Offline

OneTimer1

Re: 68k Developement
Posted on 14-Oct-2018 13:21:40

[ #439 ]

Cult Member

Joined: 3-Aug-2015
Posts: 983
From: Unknown

@hth313

Quote:

hth313 wrote:

That is just me, a fairly normal developer. Then we there are people doing in memory databases, try telling them that 32-bit is enough...

AmigaOS is restricted to an address space of 2GB, a 64 Bit CPU wont make the OS 64Bit compatible. For a 64Bit system you would need an AmigaOS that was compiled for a 64Bit CPU, something like AROS 64.

But I don't believe a 64Bit AROS could have AOS compatible structures for (real) 68k software and without compatibility to the original AmigaOS all this 64Bit extensions for the CPU would lose sense.

All this talking about CPU expansion and 64Bit extension of the 68k seems useless. A 64Bit AOS would need UAE like Sandboxes for AOS68k software. If someone wants this, he can have it under Windows.

This is where Amiga ends a 68k-64Bit CPU won't give them more compatibility then UAE onb Windows, it will only result in lower performance (FPGA) and high system prices(custom hardware).

Status: Offline

Hypex

Re: 68k Developement
Posted on 14-Oct-2018 16:51:57

[ #440 ]

Elite Member

Joined: 6-May-2007
Posts: 11222
From: Greensborough, Australia

@cdimauro

Quote:
That's wrong: alignment is VERY important (and expensive) using bitplanes.

I thought you were talking about pixel bit alignment. Where every LSb to MSb is packed together seperately. Not the planes.

Quote:
That's what you also do with bitplanes: shifting and masking using the most common operations.

Good if you only need one colour.

Quote:
So, what's the point on not accepting "weird" packed modes, like 3-bit depth? It's only due to mental schema which imposes to see at packed modes only being power-of-two (BTW; 2 and 4-bit packed modes require shifting & masking as well, and they p-o-t).

No it's because packing odd pixel depths is impractical for working with. Okay even nibbles can't be plugged in directly since they take up half a byte. But a depth like 3 would have to be shifted and masked out in a format like this:
11122233 34445556 66777888 999AAA00

Do such formats exist? I can only imagine it being used for scolling background or similar where it didn't have to be modified.

Quote:
Now add one pixel horizontally, and do again the calculations. Surprise, surprise: bitplanes are the worst in terms of wasted space, especially for an AGA screen.

321 pixels across is a weird amount. Even 320 is odd not being a mutiple of two. Yes I get your point but no one in their right mind would use such an odd amount.

Quote:
I don't think that such operations require / are based on bitplanes. Maybe they use packed graphic, and then require a chunky-to-bitplane conversion (which isn't required on packed graphic, of course).

Well that's no big deal then. Boring.

I had some ideas for a super copper list just for fun but all ideas led to true colour. In 12 bit anyway. I don't see any obvious way to program it to do an index mode. But in any way it couldn't exactly do better than low res so not useful as an RTG screenmode mode.

Quote:
Yes, but it doesn't mean that packed graphic has no advantages as well. You can also make some scroll operation with packed graphic, more or less the same as with the bitplanes on Amigas (even better/easier with 8, 16, 24 and 32-bit packed graphic: you just need to changed the pointer to scroll the are).

Makes sense. No scroll offset needed?

What about 1/4 pixel scrolling?

Quote:
I don't remember the CGA, but EGA used bitplanes, like Amigas...

Well what a strange thing to do.

Were they formatted the same? By the bit? Or did they split the planes up into packed data?

Quote:
That's simply because RTG on Amigas are alien technologies.

It wasn't integrated. It needed to be in chip ram. Inside the chipset.

But by 1989 there were graphic chipsets that could do 24-bit, blitting operations and other fancy stuff, making the A500 look obsolete. Expensive maybe. But someone had to engineer it all. While at Commodre they came to a stand still.

Quote:
In this case it's exactly the opposite: you're wasting A LOT of (chip) memory bandwidth by reloading the same mask for EVERY bitplane where you need to apply it.

Is it worse for interleaved BOBs?

In any case the Amiga bitplane sprites and BOBs are better than trying to do a soft sprite on a C16.

Quote:
I don't trust him: I trust math, which clearly proves that packed graphic is superior to bitplanes, except if you need to access specific bits (which on Amiga means single bitplanes).

Maybe why he disappeared. The AlienF1 game ran well. I don't know how it rendered the screen but it didn't look like a chunky conversion. VKarting was an interesting approach. I had hope for Breathless but lost it when I saw it was just another chunky conversion even though it was IMHO a superior engine to Doom.

Quote:
No chunky is possible with bitplanes.

I didn't think so. Not directly. Can believe they let the last Atari be superior to AGA.

Quote:
Don't worry he can prove nothing. As I said before, I've mathematically proved it on the amigacoding.de forum, and even Gunnar (which was advocating bitplanes) was surprised.

He needs bitplanes for his SAGA story to continue...

Quote:
Creative people can have challenges with packed graphic as well.

It's just more "normal".

BTW, HP printers support planar. They also support ink colour planes which are like byte planes. But RGB tripplets are more common now.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle