Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6223 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

zipper

You are an anonymous user.
Register Now!

zipper: 4 mins ago

Tuxedo: 5 mins ago

matthey: 13 mins ago

OneTimer1: 34 mins ago

Rob: 54 mins ago

70sAnd80sRule: 55 mins ago

kamelito: 1 hr 2 mins ago

ppcamiga1: 1 hr 10 mins ago

amigakit: 1 hr 13 mins ago

kolla: 1 hr 56 mins ago

Forum Index

Amiga General Chat

68k Developement

Poster

Thread

BigD

Re: 68k Developement
Posted on 7-Sep-2018 19:32:51

[ #161 ]

Elite Member

Joined: 11-Aug-2005
Posts: 7560
From: UK

@JimIgou

Quote:
In essence, does everybody have to buy a Vampire?

I think it'll lead to a larger base of 060 software to be honest. I mean Beats of Rage is great on Vampire AND Classic 060 accelerators. I think that sort of software is the way forward. Old 060 RTG users will benefit too

_________________
"Art challenges technology. Technology inspires the art."
John Lasseter, Co-Founder of Pixar Animation Studios

Status: Offline

megol

Re: 68k Developement
Posted on 7-Sep-2018 21:49:58

[ #162 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@matthey

This is the final attempt to post a reply, tried 4 times now but closed down the window, succeeded in erasing the whole thing, updated the browser forgetting about the message and finally though I had posted but apparently didn't! Must be going crazy :/
--
With a simple predecode scheme that tags each word of instruction data indicating the opcode length (1 or 2) plus the length of the immediate data or displacement data following finding the position to an address extension word (AEW - brief or full) or the next instruction is trivial. The problem becomes finding out if there is an AEW and if so how many extension words that need.

Looking at the first instruction word one can decode the possible EA field positions in parallel with determining if and if so where EA encodings are valid. Each EA position requires 6 bit to be examined, how many bits of the instruction I don't remember. Still that is fast.

If this part of the size decoder says there are no AEWs following the size of the instruction is in the predecode data directly, if not the AEW have to be decoded.

Decoding the size of an AEW requires checking IIRC 8 bits which is a small piece of logic, this produces at least a value (0 to 4) indicating additional extension words.

So if there are an AEW it have to be extracted, decoded and added to the original predecode data. The special case of a MOVE with two AEW can be handled with an extension of this scheme or just be treated as a two-clock decode case reusing the AEW decoder.

Adding a prefix would require adding a prefix detector, not a proper decoder at this stage, and replicating the (first) instruction word size decoder. Neither of those require much logic. Both first fetched words would be decoded finding out if and where an AEW is and the result of the prefix detector chooses which of the decoded results are used for AEW extraction and decode.

IOW I don't see adding a prefix complicating decode significantly. The usefulness of doing that is worthy of discussion however.

--
Extras:
One can implement a parallel array with AEW size decoders, one per fetched instruction word (or actually for fetch width + instruction buffer width) which can be a bit faster. Decode in parallel with checking if the AEW size is needed and if so which word contains it followed by a select.
This can also be used for prefixes or even the rest of the size decoding - decode in a massive parallel manner and then let the result trickle down.

Computing the predecode data is probably best done in a parallel way too, it isn't too complicated and is only done when filling an instruction cache line. Branches have to check the lower 8 bits but that's easy (special cases are all ones or all zeros which simplifies things a bit), immediate data have to be detected in the instruction type or in an EA field. MOVE as usual have to be treated as a special case. This can be pipeline too as it isn't latency sensitive.

Last edited by megol on 07-Sep-2018 at 09:51 PM.

Status: Offline

kolla

Re: 68k Developement
Posted on 8-Sep-2018 2:12:13

[ #163 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3475
From: Trondheim, Norway

@Kremlar

Software for Vampire? You mean RiVA and a jifif.datatype?

And with "NG Amiga" you mean solely OS4, right?

Last edited by kolla on 08-Sep-2018 at 02:44 AM.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

JimIgou

Re: 68k Developement
Posted on 8-Sep-2018 3:32:29

[ #164 ]

Regular Member

Joined: 30-May-2018
Posts: 114
From: Unknown

@kolla

Quote:
And with "NG Amiga" you mean OS4 soley, right?

Probably not, but that should be the convention.
NG as a blanket term for AROS, OS4, or MorphOS.
And anything "Amiga" being...well Amiga (so "NG" Amiga=OS4).

I don't think AROS or MorphOS users would have a problem with that convention as we position ourselves as alternatives to AmigaOS.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 9-Sep-2018 16:59:41

[ #165 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@megol

Quote:

megol wrote:
@matthey

The newest AVX 512 with the EVEX encoding can have 6 bytes for the main opcode, sure. 4 bytes EVEX + 1 byte opcode + 1 byte modR/M. You aren't likely to see any other prefixes with that given that one of the objectives with the EVEX and the VEX prefix/format is to remove unnecessary prefixes.

Indeed. VEX first and EVEX after basically gave a unique and homogeneous structure to SIMD opcodes, removing all prefixes and introducing new features, extending the original instructions.

It's a cleaver design, and a step in the right direction, albeit some encoding space is wasted due to the usage of some illegal instruction formats of 3 instructions. But this a constrain of the x86 architecture, which has no other encoding space available (from long time).
Quote:
I don't really agree with prefixes not being useful for 68k, the advantage with a prefix is that existing instructions needn't be replicated in a new, longer, format to be extended. And a prefix can be both faster and shorter than existing code without having complicated fusion of instructions in the processor.

Yes, but it depends on the ISA.

The x86 ISA is considered a mess, but from the applicability of prefixes perspective it's relatively simple.

In fact we have basically 3 opcodes "formats / containers":
- the normal, single byte opcode xx;
- the 0F xx one;
- the 0F 38 xx and 0F 3A xx.

So, there are up to 1024 (a bit less) "base" instructions, which can be further extended / "redefined" using prefixes.

All instructions can be grouped in 4 (macro) categories:
- instructions that operate on a register which is specified;
- instructions that operate on memory using the modR/M;
- instructions that operate on memory using the more complex SIB;
- instructions that don't operate on registers or memory (e.g.: call/jmp with immediates, ret, in/out, cli/sti, ecc.).

On top of this, there's the possibility that any of those 4 macro-categories have an immediate which is following (it's the last part of the opcode).

Applying a prefix (or more) which alters (e.g.: more registers, data size, address size, lock, ecc.) the base instruction isn't that complicated because, once you ended up to which category it belongs too.
What I mean is that if, for example, the prefix allows to access more/new registers, then it's enough to apply this information to the register (first category) or the modR/M, or the SIB, because those information is found on standard / fixed places inside the opcode.

Which is nothing that can surprise, since this is 8086's intrinsic nature: an ISA which was explicitly designed to make use / apply prefixes.

Of course, prefixes have a cost, as you and matt discussed after, but from a frontend perspective (identifying prefixes + decoding base instruction + classify the instruction category + applying the found prefixes to build the final uop/uops) is a relatively simple task.
Specifically, the instruction classification requires a 1024-bit array with a 2-bits result (plus another bit to see if the instruction has an immediate or not) LUT, which is not that much.

Applying a similar method to the 68Ks is possible, but it's more complicated, because there are more cases:
- 16-bit opcode with EA (and 10 bits for the base instruction), but EA has several different addressing modes to specify, included an immediate.
- 16-bit opcode with only data register, or only address register, or a generic register (data or address);
- 16-bit opcode which doesn't references memory neither a register;
- 32-bit opcodes with EA, more registers and/or short immediates. Which means that more categories should be defined, with a more complex structure.
The first and last cases (especially) are the most difficult to handle, of course.

In general, I think that some logic should be used to identify particular instruction categories, and maybe use separated, smaller LUT tables, to reduce the implementation cost. There should also be a per-category logic to apply the prefix(es) to the specific case (considering also that EA, where is present, defined several sub-cases).
The mem-to-mem MOVE instruction requires special handling, of course, since you have 2 EAs there; plus some instructions have mem-to-mem capabilities as well, albeit limited to specific addressing modes, and this creates other exceptions to be handled.

It's not hard and certainly it's possible to implement, but to me it looks much more complicated compared to the x86 ISA.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 9-Sep-2018 18:03:18

[ #166 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@matthey

Quote:

matthey wrote:
Quote:

megol wrote:
The newest AVX 512 with the EVEX encoding can have 6 bytes for the main opcode, sure. 4 bytes EVEX + 1 byte opcode + 1 byte modR/M. You aren't likely to see any other prefixes with that given that one of the objectives with the EVEX and the VEX prefix/format is to remove unnecessary prefixes.

Yes, I was talking about 6 byte AVX instructions. There must be significant cost to the prefixes if they are wanting to reduce the number of prefixes even as the instruction length grows. From what I have read, the long SIMD instructions have been a performance problem for low end Atom CPUs.

Handling more prefixes and having long instructions are two different things, with different impacts.
Quote:
I hope the 68k doesn't need 6 byte SIMD instructions too. There is a lot of F-line encoding room but a SIMD unit needs plenty, especially with 3 op and many registers.

The F-line has too little space for a decent, modern SIMD unit. Not even completely absorbing both A and F lines can give you enough encoding space, as I've discussed some time ago with HyperX. Unless you want to lower the bar and putting some limits to that ISA; then it's possible, but it can be too much constrained.
Quote:
Quote:
It seems it doesn't matter anyway. The only 68k core being updated now is AFAIK the Apollo core and while they claim to be working on OoO execution it feels like a dead end. Nice for some semi-retro fun but not for NG developments.

I hope you are not implying that the OoO execution feels like a dead end. OoO for longer latency instructions like integer division and FPU instructions is cheap and should give a modest boost to performance.

Atom almost doubled the performance going from in-oder to out-of-order, while keeping the same limit of max 2 instructions decoded & executed per cycle.

Status: Offline

cdimauro

Re: 68k Developement
Posted on 9-Sep-2018 18:24:55

[ #167 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@megol

Quote:

megol wrote:
@matthey

[quote]The EVEX format supports more registers and both it and the VEX format is designed to open up more instruction space without needing any type of hacks (like some SSE instructions used).

For VEX (AVX, AVX-2) it's true: there's additional encoding space for new instructions due to the longer VEX 3 bytes prefix.

However EVEX (AVX-512) has no such space. There's only a bit which is reported as reserved and should be zero, but it's not clear if it's part of the legacy BOUND instruction, or is it really available.

Status: Offline

OneTimer1

Re: 68k Developement
Posted on 9-Sep-2018 18:29:59

[ #168 ]

Super Member

Joined: 3-Aug-2015
Posts: 1258
From: Germany

@WolfToTheMoon

Quote:

WolfToTheMoon wrote:

I'm guessing that the Vampire project is several times more succesful in terms of sales(that is without supporting some of the most popular Amigas) than any of the NG Amiga project...

I don't believe they sold at least 500 of their VampireV2 boards and they will hardly sell 5000.

Status: Offline

OlafS25

Re: 68k Developement
Posted on 9-Sep-2018 19:17:21

[ #169 ]

Elite Member

Joined: 12-May-2010
Posts: 6494
From: Unknown

@OneTimer1

will they?

Hopefully you can also forecast stock exchange and give us a hint

I think the vampire boards are certainly potential limited by amiga hardware users, the standalone devices are not. So no clue how many users of vampire boards and/or standalone devices will be in f.e. two years. But at least it is a growing userbase, something the NG camps cannot offer

Status: Offline

cdimauro

Re: 68k Developement
Posted on 9-Sep-2018 19:30:50

[ #170 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@matthey

Quote:

matthey wrote:
Itanium was aimed at the high performance market also.

The long term plan for Intel was to use the EPIC ISA for every market segment. It started for servers / workstations, but failed to reach the desktop market (also because of AMD64, which filled the gap for servers).
Quote:
The long instructions and many prefixes are not difficult for high performance hardware but they are for mid performance and energy efficient hardware. Take the early Intel Atoms for example.

Not necessarily, as I've said before, but see below.
Quote:
Quote:
The instruction fetch rate is approximately 8 bytes per clock cycle on average when running a single thread. The fetch rate can get as high as 10.5 bytes per clock cycle in rare cases, such as when all instructions are 8 bytes long and aligned by 8, but in most situations the fetch rate is slightly less than 8 bytes per clock when running a single thread. The fetch rate is lower when running two threads in the same core.

The instruction fetch rate is likely to be a bottleneck when the average instruction length is more than 4 bytes. The instruction fetcher can catch up and fill the instruction queue in case execution is stalled for some other reason.

https://www.agner.org/optimize/microarchitecture.pdf

This only talks about instruction length: prefixes aren't mentioned.
Quote:
Atom went back to a more energy efficient early in-order architecture which was unsuccessful (they abandoned this market and adopted less energy efficient targets). Long SIMD instructions were one of the reasons why it failed.

They are not: see below. Average SIMD instruction length is around 5 bytes (at least for the executable that I've disassembled and generated statistics), which isn't a big problem for a 2-ways pipeline, if the instruction prefetch window is 16 bytes (but even if its only 8 bytes, 3 bytes are enough to correctly decode the instruction).
Quote:
They were counting on multi-threading to improve efficiency but long instructions reduced the sharing and expected efficiency gains. Multi-threading was probably a mistake but instruction fetch is expensive.

I don't think so. Multi-threading is/was introduced because an in-order design leaves a lot of execution units under utilization, so this allows to better utilize them. Even on out-of-order designs it's very useful, and that's the reason why it's still implemented on high-end processors.
Quote:
The 68k has better code density and 16 bit aligned instructions which are an advantage over the x86.

That's true, but x86 has also a 64-bit successor, which is the one which dominates the market from several years, whereas 68K has only 32-bit members.
Quote:
The 68060 is 42% more energy efficient and is using 21% fewer transistors compared to the most comparable in-order Pentium that the early Atom designs went back to for energy efficiency. The 68060 has good performance with just a 4 byte/cycle instruction fetch so I expect the 68060 could be successful where the Atom was not

That's an incorrect and unfair comparison, which you've already reported before.

68060 used less transistors because Motorola traditionally cut features on its processors, and 68060 has both super and user mode changes (instructions removed, and simplified MMU). Another important mistake is not providing a fully-pipelined FPU, which basically crippled its FPU performance. And this processor is only able to pair instructions which both are 2 bytes in length, with several pairs limitations. It also introduced no new instructions. And last but not least, the design didn't reached high frequencies.

Compare it with the Pentium and you'll see the exact opposite scenario, starting from a full backward-compatibility and ending up to very high clock frequencies.

It'll be interesting to see the SPECint and SPECfp numbers for both processors, to compare their real-world efficiency.
Quote:
but moving to 6 byte SIMD instructions might change this. It may be better to limit the number of SIMD registers to 16

16 SIMD registers are too few. Intel introduced AVX-512 (with the EVEX prefix) to bring the SIMD registers to 32, which is a decent number for a CISC architecture. IBM found in 64 SIMD registers a good compromise for the new VMX2 (there was a paper about it).
Quote:
or stay with mostly 2 op instructions if SIMD instructions could be kept to 4 bytes.

2 op instructions is anachronistic for FP and SIMD units.
Quote:
A CISC SIMD unit doesn't need as many registers as a RISC SIMD unit. Increased SIMD parallelism with wider registers uses less encoding space.

If it was so simple then many CPU vendors could have adopted massively long vector SIMD units from very long time. Which is not the case.
Quote:
Quote:
Using only two bits would be a waste which is why I have sketched many different types of prefixes to reduce waste. But none of those use only two bits, the smallest used 6 bits IIRC out of the 11 available in the Coldfire MVZ/MVS space.
So it would be possible to add 64 bit operations, 16 registers (each of Ax, Dx), sign/zero extension and other features to the standard instruction set. This would partially compensate the larger instructions however what features to include would need a simulator plus compiler support to decide.

With a 16 bit prefix, multiple functionality would need to be provided per prefix. The improved functionality would be convenient but I worry that code density would deteriorate.

It will. But code density is not the most important thing for an ISA.
Quote:
Adding instructions to free/available encoding space improves code density in comparison. My instincts prefer recovering some encoding space with a 64 bit mode where new instructions would be added without affecting 32 bit compatibility.

If you have 32 and 64 bit modes you don't have to care about 32-bit compatibility, since it's "built-in".
Quote:
I don't think adding more registers is worth the cost with a CISC CPU.

Why not? More registers allow a better ABI convention, putting more parameters into the registers instead of pushing (and popping) them into the stack.

68K is also short in address registers, which is a pain even for assembly coders.
Quote:
Specifically, I haven't seen any statistics which show elevated instruction counts or memory traffic from the 68k only having 16 registers (statistics I have seen show it to be close to architectures with 32 registers in these categories). RISC needs more registers to reduce the increased instruction count and memory accesses which we do not. RISC needs larger register files, a larger instruction fetch, larger instruction caches and more memory which are major handicaps for performance and energy efficiency on mid-performance CPUs where I believe the 68k could shine. I would prefer to make more efficient use of the advantages we have.

I would like to see statistics, especially code density comparisons, if you added prefixes to a 68k compiler backend. It is no easy task as I would like to do the same for some of my ideas.

It'll be very interesting to see some real-world statistic.
Quote:
The 68k is easier to decode than the x86 and adding prefixes would reduce this minor advantage.

Is there any study which came to that conclusion?

Status: Offline

gregthecanuck

Re: 68k Developement
Posted on 9-Sep-2018 20:07:40

[ #171 ]

Cult Member

Joined: 30-Dec-2003
Posts: 846
From: Vancouver, Canada

Quote:

JimIgou wrote:
@Kremlar

The biggest problem with that is once software requires a Vampire to run, then the prospective base of users becomes limited to the Vampire community.

Every non-standard feature, or beyond legacy capability provides benefits AND limitations.

Just like other high end accelerators, '040, '060s, PPCs, will software designed to support a limited subset of high end equipped Amigas be able to find a large enough audience to prove successful?

In essence, does everybody have to buy a Vampire?

Hi Jim -

The recently released JPEG datatype shows how things *should* be done with system-level code. There are Vampire optimizations that kick in if an 080 processor is detected.

Aminet link: http://aminet.net/package/util/dtype/JFIFdt44

Status: Offline

cdimauro

Re: 68k Developement
Posted on 9-Sep-2018 20:13:31

[ #172 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@matthey

Quote:

matthey wrote:
The Apollo core design most resembles a 68060. The 68060 design resembles how an in-order CISC CPU design would be created today. The biggest limitations in performance compared to today's CPUs are due to the old die size and small area (transistor) requirements.

68060 2,500,000 transistors
ARM Cortex-A9 (32 bit) 26,000,000 transistors
Atom (32 bit) 47,000,000 transistors

You could have many times the number of 68060 cores as a Cortex-A9 or Atom CPU for the same transistor count.

Again, this isn't a fair comparison. At least using the Atom, which has a lot of ISA and micro-architectural changes. The Pentium would have been the correct competitor, albeit I've written that it's still unfair to compare it to the 68060, since the latter is a crippled processor in terms of overall features (and performance too).
Quote:
It is easy to add multiple cores but what is needed today is strong single core performance without resorting to high clock speeds.

Not necessarily. While I strongly prefer high single core/thread performance, lower performance in that scenario might be desirable for lower power consumption and/or price and/or better overall/aggregated performance.
Quote:
The RISC advantages were supposed to be the following.

1) simplify the pipeline to increase clock speeds
+ simplifies CPU design
- produced more heat than expected
a) resulted in multi-core CPUs
- increases the number of instructions and decreases code density
a) resulted in poor single core performance as more instructions needed to be executed
b) resulted in increased instruction fetch requirements, cache requirements and memory traffic
c) resulted in larger and more costly register files to reduce the problems of a and b.
2) move CPU complexity into the compiler
+ simplifies CPU design
- the compiler lacks knowledge of what the CPU and often programmer is doing
a) resulted in poor CPU performance
- compiler design becomes more complex
a) resulted in RISC CPUs needing better compiler support

RISC has been unable to solve the performance problems from a flawed philosophy (it was obvious that #1 above was a mistake when the Alpha put DEC out of business). CISC has superior performance characteristics. If you don't believe me, then it is time to educate yourself. See the comparison of architecture types from a university professor.

http://cs.uccs.edu/~cs520/S99ch2.PDF

I agree with you and thank you very much for the interesting paper, albeit it's too much outdated and an update with the more modern processors/ISAs would have been much appreciated.

What impresses me is both the first (Stack) and last (Mem-Mem) results. However the 68020 got a very nice and balanced result.
Quote:
Most ARM CPUs are trying to strike a balance between performance and energy efficiency as well. These are much cheaper CPUs to design yet are the majority of CPUs sold. This is where I believe the 68k would be interesting.

Consider that RISC-V will be a strong contender for all current leading architectures.
Quote:
As CISC instruction and addressing mode cycles are reduced, the more of an advantage CISC has over RISC.

Indeed. RISCs lost their advantage once more transistor were available for microprocessors, allowing CISC designs to use the same techniques which permitted RISCs to get high performances, and even surpassing them thanks to the CISC advantages.

The funny thing is that modern RISC designs don't even resemble their ancestor, but look more like a CISC.
Quote:
The 68k has some of the same advantages as the x86_64 while likely being more energy efficient.

This requires a fair comparison, which isn't possible.
Quote:
You do agree that 68k hardware can expand the Amiga user base. Do you think this amount of expansion is relative to the amount of 68k hardware sold? Do you think the amount of hardware sold is based on supply and demand where a lower price and better value would result in greater demand? Do you think this is a good reason to try to create 68k hardware with a very good price and value?

I don't think that this might change the current 68K market situation. 68K has made its time, and this was due to the absurd Motorola decision to put a stop at this beautiful processors family. Now the gap with the other mainstream ISAs is too large, and it's unlikely that it'll be filled, even with a good new design, and evolving the ISA introducing more modern features.

Status: Offline

OneTimer1

Re: 68k Developement
Posted on 9-Sep-2018 22:06:28

[ #173 ]

Super Member

Joined: 3-Aug-2015
Posts: 1258
From: Germany

@OlafS25

Quote:

OlafS25 wrote:

will they?

AFAIK the sales of VampireV2 are very low, not even Vesalia seems to be interested. And the Apollo team is still just a group of hobby developers.

Don't take me wrong I would like to see at least 10k of Amiga users buying VampireV2 accelerators or VampireV4 stand-alone systems. But according the low interests I have noticed on Amiga products, I don't believe in a huge success.

Best selling Amiga Product seems to 'Amiga Forever' ...

Status: Offline

wawa

Re: 68k Developement
Posted on 9-Sep-2018 22:21:11

[ #174 ]

Elite Member

Joined: 21-Jan-2008
Posts: 6259
From: Unknown

@OneTimer1

Quote:
not even Vesalia seems to be interested

and you think vesalia only sells to 10k+ audience? whooa!

Status: Offline

matthey

Re: 68k Developement
Posted on 10-Sep-2018 3:44:07

[ #175 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2751
From: Kansas

Quote:

bison wrote:
Quote:
...AmigaOS doesn't have a concept of threads like Linux/BSD.

Looking at it another way, all AmigaOS really has is threads, since every process shares the same address space. What it really lacks is the Linux/BSD concept of isolated processes.

Some people would like to add process isolation to the AmigaOS. This is possible but more hardware dependent for a micro-kernel.

On Micro-kernel Construction
http://www.cs.fsu.edu/~awang/courses/cop5611_s2004/microkernel.pdf

The paper suggests that only 3 kernel calls/functions are needed (Grant, Map, Flush). The memory manager and pager can be in user space with memory isolation. It doesn't sound too difficult to add although the AmigaOS is using lots of shared/public memory. Memory of every library and device (devices are libraries too) is shared with the process which calls them. We could map libraries to the process on open and flush on close with some compatibility loss but it would be more compatible to create a new secure library type/flag for this with classic libraries left in shared memory (perhaps as RO with security settings turned up). Of course there are lots of open doors to supervisor mode which need to be closed and secured. The article is actually about the performance with micro-kernels. One of the conclusions is that "micro-kernels are inherently not portable". "Processors of competing families differ in instruction set, register architecture, exception handling, cache/TLB architecture, protection and memory model. Especially, the latter ones radically influence micro-kernel structure. ... We have to expect that a new processor requires a new micro-kernel design." Some hardware is going to be difficult and inefficient to support. Supporting a large variety of hardware is certainly a pain.

The article found that segment register switching is the fastest way to switch address spaces complementing PPC for its support of this. This is gone on newer NXP PPC CPUs though. Instead, large pages are now required which reduce TLB misses but increases TLB miss overhead among other problems.

A Survey of Techniques for Architecting TLBs
https://www.academia.edu/29585076/A_Survey_of_Techniques_for_Architecting_TLBs

While performance has generally improved, responsiveness and jitter have often not. Most hardware is made for monolithic Linux/BSD/Windows kernels. I worry that AmigaOS with added address spaces on non-optimal hardware will lose its responsiveness advantage. I wrote about the responsiveness of some RTOSs in past posts.

Quote:

Let's look at how much difference in responsiveness porting the AmigaOS to other hardware and adding modern feature might make for lower performance hardware (old study with PII@400MHz but it doesn't matter).

https://www.lisha.ufsc.br/wso/wso2009/papers/st04_03.pdf

OS | Response Time | Latency | Latency Jitter
Windows XP 200 848 700
uC/OS-II 1.92 3.2 2.32

Windows XP takes 104 times as long to respond, has 265 times the latency and has 301 times as much latency jitter as a RTOS without using the MMU on the same hardware.

That was an impressive difference but now lets compare to dedicated real time hardware. The Fido 1100 has best case latency, worst case latency and jitter which are fractions of the time of a RTOS on embedded ARM hardware which is known for its responsiveness (see figure 3 in the following white paper).

REMOVE THE RISC FROM YOUR EMBEDDED DESIGN
http://docplayer.net/45089874-Remove-the-risc-from-your-embedded-design-white-paper.html

This whole whitepaper is worthy of reading. One point is the memory and lower clock speed advantages of CISC (Fido is 68k). The articles referenced are really good too although several of the links are broken so I'll give the new ones.

The RISC that did not pay off
https://www.eetimes.com/document.asp?doc_id=1155100

RISCy Business
https://www.embedded.com/electronics-blogs/significant-bits/4024529/RISCy-Business

The Fido MPU supports hardware memory protection and context isolation without paging using fast memory blocks in registers. The Fido CPU is not quite dynamic enough for the general purpose AmigaOS but the philosophy is right on and there are some ideas which could be used. I had some nice conversation with Fido architect Dave Alsup in regard to a higher performance more general purpose embedded 68k CPU while I was part of the Apollo Team looking for potential embedded partners. He's a really sharp guy with a different viewpoint about computing.

Status: Offline

OlafS25

Re: 68k Developement
Posted on 10-Sep-2018 8:46:36

[ #176 ]

Elite Member

Joined: 12-May-2010
Posts: 6494
From: Unknown

@OneTimer1

The Vampire are a kind of replacement/add on card for real amigas like A500 and A600 (and future A1200). They are not perfect for big-boxes because the concept is not to integrate but to replace existing hardware. Additional they are not cheap in todays terms when comparing it to mainstream hardware so it is for enthusiasts in current community. The standalone devices are first new amiga hardware since 90s so they will be not cheap either but are very interesting. Of course we still talk about sales in thousands not millions. It is a niche series of hardware for current amiga enthusiasts and nerds, not more not less. If it is a success depends on what you compare it with, on todays terms it is of course a tiny market, comparing it with other amiga platforms it is a fast growing and successful platform.

It is not for everyone of course, I am personal happy with UAE and do not need real hardware. One important point is people did not support 68k anymore because of the limited hardware base, only few owning 68060 + graphic card so demanding software like games were not ported. When I asked the answer was UAE might be powerful enough for new games but why supporting it when you can f.e. run the game on windows. With Vampire 68k platform is lifted so even if you write software that is not special adapted to vampire hardware the software automatical benefits from more resources.

So potentially games could:
1.Not being adapted at all but automatical benefit from more resources
2.Or the software detects vampire at start and offers more features, higher resolutions and so on
3.Games are specifical written using specific features

It is up to programmers what they want support or not. Third means of course that games only run on vampire/standalone whereas the other two also run on real amigas without vampire and UAE. If you add potential UAE users we do talk about much higher numbers of potential users.

Last edited by OlafS25 on 10-Sep-2018 at 09:22 AM.
Last edited by OlafS25 on 10-Sep-2018 at 09:20 AM.
Last edited by OlafS25 on 10-Sep-2018 at 08:51 AM.

Status: Offline

CosmosUnivers

Re: 68k Developement
Posted on 10-Sep-2018 9:05:15

[ #177 ]

Regular Member

Joined: 20-Sep-2007
Posts: 113
From: Unknown

Why this thread ?

You ALL know that ALL is blocked on Classic 68k :

- CyberGraphX : blocked by Phase5
- Picasso96 : blocked by Jens Schönfeld
- AGA components : blocked by Bill McEvil
- Warp3D : blocked by Hyperion
- 68k CPUs : blocked by NXP
- Kickstart : blocked by Cloanto and Hyperion
- BoingBag : blocked by Haage & Partners
- Poseidon : blocked by Jens Schönfeld
- PCI stuff : blocked by Elbox...

All is done to make any Classic return totaly impossible...

End of discution...

Status: Offline

megol

Re: 68k Developement
Posted on 10-Sep-2018 9:09:16

[ #178 ]

Regular Member

Joined: 17-Mar-2008
Posts: 355
From: Unknown

@matthey

Now it's obvious that I'm not going crazy, twice the have tried to reply to your post and twice the whole text have just disappeared when previewing. This is strange and frustrating no matter if it's due to the browser (Chrome) or the website. :(
So here is a reply without quotes (they have to be previewed to be right IME) and shorter (third time I write this down).

--
While the 1995 paper by Liedtke is a classic it is also outdated.

Later revisions of the L3/L4 family moved towards a portable design without losing focus on performance. See the paper below:
From L3 to seL4 What Have We Learnt in 20 Years of L4 Microkernels?
http://sigops.org/sosp/sosp13/papers/p133-elphinstone.pdf

Especially see section 4.5 on page 145 "Non-portability" including:
"This argument was debunked by Liedtke himself ... Careful design and implementation made it possible to develop an implementation that was 80–90% architecture-agnostic"

--
The best way to architect TLBs for the Amiga would be essentially removing them, keep a protection cache that can be slower than a TLB without causing performance problems (assume a memory access is legal and restart from a checkpoint if an access violation occur) and a memory-level address translator. This is possible as the Amiga OS assumes a single address space.

--
The Fido is using a barrel style processor combined with an axed cache and isn't really comparable to a mainstream processor for several reasons.
The deterministic cache is actually a feature of most embedded processors and even a few x86 processors: cache line locking. Read things into the cache and disable updates of that cache line - voila! Actually this have been used with mainstream x86 processors even without explicit support for the early startup when DRAM haven't been initialized, that's more of a hack though.

Also while latency varies (with the variance being the time jitter) it doesn't matter but for specific hard real-time tasks. Even for hard real-time that doesn't matter if the worst case timing is lower than the maximum allowed response time.

The jitter is also unavoidable as it isn't usually the TLBs or protection causing them but the mere fact that _caches_ are needed for any reasonably powerful processor that isn't a barrel style one. For an example of a high performance design without caches one can look at the Tera MTA and realize the inherent delay for each and every memory access that had to be covered by explicit multi-threading.

--
Using the oldest memory protection design (base and bound) like in the Fido seem like a bad idea. Even going further developing that to a segmentation design (multiple bases and bounds) is IMO a bad idea as long as it will be applied to the Amiga and the Amiga OS. It would require a complete redesign.

And I say that as someone that have always liked the idea of a proper segmentation design, and the x86 is about as far one can get from a proper one. It could be a nice hobby project to make an usable segmentation based processor and OS - but the result wouldn't be an Amiga*.

(* or perhaps it would, it just seem to be a label nowadays)

Status: Offline

wawa

Re: 68k Developement
Posted on 10-Sep-2018 12:16:22

[ #179 ]

Elite Member

Joined: 21-Jan-2008
Posts: 6259
From: Unknown

@CosmosUnivers

Quote:

- CyberGraphX : blocked by Phase5
- Picasso96 : blocked by Jens Schönfeld

there is open source aros equivalent to these subsystem also along a wrapper to use p96 device drivers.

Quote:
- Warp3D : blocked by Hyperion

there is wazp3d inclusive hardware certain accelerated backends, among others on aros i386.

Quote:
- Kickstart : blocked by Cloanto and Hyperion

aros provides open and apparently pretty compatible universal kickstart replacement.

Quote:
- BoingBag : blocked by Haage & Partners

aros provides most if not all functionality included in these patches out of the box, also on amiga.

Quote:
- Poseidon : blocked by Jens Schönfeld

only device drivers that are being supplied with the hardware, poseidon and sources is included in aros and have been tested working with the genuine device drivers.

Quote:
- PCI stuff : blocked by Elbox...

this amiga pci bridges are unfortunatelly not yet fully implemented in aros..

Status: Offline

Hypex

Re: 68k Developement
Posted on 10-Sep-2018 13:09:43

[ #180 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@OneTimer1

Quote:
Best selling Amiga Product seems to 'Amiga Forever' ...

That doesn't even run on an Amiga.

Shows the state of the market after all these years.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle