Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

13 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

OlafS25

You are an anonymous user.
Register Now!

OlafS25: 3 mins ago

matthey: 7 mins ago

ppcamiga1: 16 mins ago

bhabbott: 18 mins ago

Karlos: 22 mins ago

michalsc: 47 mins ago

ncafferkey: 1 hr 24 mins ago

pixie: 1 hr 29 mins ago

Hypex: 2 hrs 15 mins ago

agami: 2 hrs 23 mins ago

Forum Index

General Technology (No Console Threads)

Applied Micro moving away from PowerPC

Poster

Thread

cdimauro

Re: Applied Micro moving away from PowerPC
Posted on 1-Nov-2014 20:52:33

[ #121 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hypex

Quote:

Hypex wrote:
@cdimauro

Quote:
[quote]But it costs a register, A4

Yes, just one. However, the 68K design also costs a register, A7, for the stack. I never got that, it looked out of place and bad design to use address register for stack pointer.

Because it simplifies the hardware design, and that's why it's a very common choice for other architectures too.

However with A4-A7 registers being used, only 4 are left for 68K application code, which isn't that much.
Quote:
Quote:
16-bit (short) absolute address can be used instead of a 32-bit one,

Never thought about this for 68K. It would need MMU to use it so rare optimisation.

Sure, but MMUs are very common and "available" nowadays. We only need to use them.
Quote:
However, one tihng I forgot until now, is PC relative addressing. As long as the address is reachable from the instruction this was also a faster way to access data, or so I read. I think this was useful for ASM, as you didn't need to manage it through anotrher register.

It's useful, for sure. But the 68K had no possibility to write to the memory using a PC-relative address mode.

You can use it only for constants, or for jump tables.
Quote:
Quote:
So, have you encountered only one bug in you coder life?

I doubt it. Bugs can have many forms. I consider a crash to be a show stopper.

It is, usually.
Quote:
Perhaps the ones I hate the most are in code I have ported. Since it isn't code I wrote and it can be hard to know why it is crashing. Also it gets annoying getting bug reports about code you haven't written in the first place. You want to move it upstream but the people are asking you.

That's not your fault.
Quote:
Quote:
Reading an int, for example, doesn't guarantee that you're reading 4 bytes.

It used to be 2 bytes. And why we have macros like int32 these days.

It doesn't help to solve the problem even using macros / defines. A compiler can reserve 64-bit data for int32, for example, since this kind of information is implementation-specific (the language specifics say nothing about such low-level detail). It means that if you read an int32, it may end up reading 64-bit of data.

If you want to run portable code which handles data, either you have to define several macros/defines to correctly cover all supported platforms, or simply resort to the classic "read a byte a the time, with shifting and masking of data".
Quote:
Quote:
Absolutely. But you need a plug-in for this.

That sounds good then. Almost putting OS4 compiling to shame.

But you need to write the plug-in first.

Status: Offline

broadblues

Re: Applied Micro moving away from PowerPC
Posted on 1-Nov-2014 22:38:46

[ #122 ]

Amiga Developer Team

Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@cdimauro

Quote:

It doesn't help to solve the problem even using macros / defines. A compiler can reserve 64-bit data for int32, for example, since this kind of information is implementation-specific (the language specifics say nothing about such low-level detail). It means that if you read an int32, it may end up reading 64-bit of data.

Huh? If your headers setup a uint32, which is fact a 64 bit integer then clearly they are broken in some way. Otherwise there is no point. uint32 is usually a typedef not a mcro anyway. Ofcourse for portable code you should always use the sizeof() construct and not make any assumptions about the size / alignment of types.

I suppose it's possible for a complier environment not to have int32 size datatype at all, but then the hearders would not define it.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

Status: Offline

Samurai_Crow

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 2:43:33

[ #123 ]

Elite Member

Joined: 18-Jan-2003
Posts: 2320
From: Minnesota, USA

@cdimauro

Quote:

cdimauro wrote:
@Hypex

Quote:

Hypex wrote:
@cdimauro

"But it costs a register, A4"

Yes, just one. However, the 68K design also costs a register, A7, for the stack. I never got that, it looked out of place and bad design to use address register for stack pointer.

Because it simplifies the hardware design, and that's why it's a very common choice for other architectures too.

However with A4-A7 registers being used, only 4 are left for 68K application code, which isn't that much.

A5 can be freed up when not using the debugger and if you're not accessing a shared library at the moment, A6 can also be repurposed.

Quote:
Quote:
"16-bit (short) absolute address can be used instead of a 32-bit one,"

Never thought about this for 68K. It would need MMU to use it so rare optimisation.

Sure, but MMUs are very common and "available" nowadays. We only need to use them.

Not on Commodore Amigas. You'd have to get an accelerator card for most Amigas because Commodore used the embedded-controller edition of most CPUs thus eliminating most of the MMU capabilities.

Status: Offline

cdimauro

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 6:21:21

[ #124 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@broadblues

Quote:

broadblues wrote:
@cdimauro

Quote:

It doesn't help to solve the problem even using macros / defines. A compiler can reserve 64-bit data for int32, for example, since this kind of information is implementation-specific (the language specifics say nothing about such low-level detail). It means that if you read an int32, it may end up reading 64-bit of data.

Huh? If your headers setup a uint32, which is fact a 64 bit integer then clearly they are broken in some way. Otherwise there is no point. uint32 is usually a typedef not a mcro anyway. Ofcourse for portable code you should always use the sizeof() construct and not make any assumptions about the size / alignment of types.

I suppose it's possible for a complier environment not to have int32 size datatype at all, but then the hearders would not define it.

No, a compiler may use 64-bit integers for everything, even for a char, and can also define the full integers types hierarchy as well.

Defining uint32 means only that: "this type let you hold and handle unsigned integers from 0 to 2^32 - 1".

But you cannot assume that such type should use only 4 bytes.

That's if we talk about portability.

Something on the argument: http://stackoverflow.com/questions/2331751/does-the-size-of-an-int-depend-on-the-compiler-and-or-processor
But the ANSI/ISO C draft reports the same things, for sure.

@Samurai_Crow

Quote:

Samurai_Crow wrote:
@cdimauro

Quote:

cdimauro wrote:
@Hypex

However with A4-A7 registers being used, only 4 are left for 68K application code, which isn't that much.

A5 can be freed up when not using the debugger

It depends on the language. If you use Pascal, for example, with inner functions, it's likely (I don't know how a compiler decide to implement it: that's what I'll do if I had to write a compiler for 68K which supports inner functions) that A5 is used to hold the frames chain for quickly accessing the "non local" (but not globals) variables defined in the outer function(s).

So, even without using a debugger, this register can be used as well.
Quote:
and if you're not accessing a shared library at the moment, A6 can also be repurposed.

Of course.
Quote:
Quote:
Sure, but MMUs are very common and "available" nowadays. We only need to use them.

Not on Commodore Amigas. You'd have to get an accelerator card for most Amigas because Commodore used the embedded-controller edition of most CPUs thus eliminating most of the MMU capabilities.

I know, but I don't think that developers will use "plain vanilla Amigas" to develop new software. However, even with an MMU on board, the problem is the o.s., which defines a common address space for everything: that makes almost impossible to use an MMU to define the task's global variables using the lower virtual addresses.

Status: Offline

BigGun

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 7:45:36

[ #125 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

CHK

The CHK instruction of the 68K is there to provide checking
of array boundaries and to prevent accidently access of outside an array.

The 68K does provide this feature on purpose.
This CHK instruction is used by AMIGA Modula and Oberon compilers.

The main reason for Program crashes is accidental
miscalculation of pointer and overwriting of data outside the allowed range.
Oberon programs do not suffer from these errors.

You do not need to CHK every memory access but only those which address is not known to compile time. All addresses which are known by compile time can be checked by the compiler and therefore can be secured. Only those access which are calculated during runtime need to be checked.

This mean a simple for loop which runs from $START to $END will be checked by the compiler as long as these are constants.

The CHK instruction is specially optimized for this checking purpose
and will do this more efficiant than doing the same work using normal code.

The CPU is aware that the CHK should normally not trigger and will
therefore not allocate a branch prediction slot for it.
On a proper CPU the CHK instruction takes only a single clock cycle.
Also on a Super-Scalar CPU the CHK instruction could
be executed in parallel in a free slot in the 2nd ALU.
Which means often the CHK could be excuted for 0 cycles.

If you mind that CHK is not needed for every memory access but only for those that are calculated - and if you consider that CHK can be executed very fast - then you will agree that the overhead of securing a progrma with CHK is very small.

While Oberon compilers do this CHK securing per default - this can be disabled if wanted.
This means the develop will use CHK during development and has the choice to also create tuned version if desired.

Some people might ask : "But can I not use the MMU for the same?"
The MMU can spot access to different 4K memory pages.
As you know overwriting an array with a _single_ byte will trash memory.
A CHK will secure the program against these issue - the MMU can not offer the same security.

OK, MMU is not the same granularity - so it will not catch all errors but at least some.
Is the MMU not at least faster or for free?

No - the MMU is not for free. It comes also for a price.
Depending how the tagging and translation is done in the CPU it will either take an extra pipeline step or can be done in parallel. If the tagging is done in parallel then each task needs its own MMU table. This means the MMU table needs to be reloaded and swaped on TASK change. Also a MMU table needs to be cached in the CPU using TBL entries these TBL entries need CPU space and are therefore by design limited. If an access misses such entry a table walk in memory needs to be done to reload the entry. This table walk by design is always very slow.

The price for enabling an MMU is relative small its not like your system runs then at halve speed but the system will loose a few percent performance.
The MMU table will eat a little bit of free memory. The flushing and reloading of table, and the often depending on the CPU design needed flushing of branch prediction entries will make a task switch slower. TLB entries will not be needed to reloaded on every access - but if this is needed they will create a slowdown.

Which program will run faster:
A program using CHK or a program using the MMU?

This is hard to answer.
Both have some overhead. In both cases the overhead is small.
One thing is clear the MMU is not able to spot short overwrites and therefpre does not offer the same protection.

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

Kronos

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 8:18:08

[ #126 ]

Elite Member

Joined: 8-Mar-2003
Posts: 2562
From: Unknown

@Samurai_Crow

EC CPUs used by C= :

68EC20 in A1200 and CD32, no relevance to MMU as that would be an extra chip even with the full 68020.

68EC30 in the A4000/30, yeah here the MMU is missing but those A4000s were produced more as an afterthought to fill the pricing gap between A1200 and A4000/40.

All other Amigas either used a CPU with MMU (A3000, (A2500) and A4000/40) or a 68000 where an MMU wasn't even an adon option.

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

Status: Offline

umisef

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 8:42:55

[ #127 ]

Super Member

Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@cdimauro

Quote:
No, a compiler may use 64-bit integers for everything, even for a char, and can also define the full integers types hierarchy as well.

Defining uint32 means only that: "this type let you hold and handle unsigned integers from 0 to 2^32 - 1".

You are about 15 years out of date. Since the ANSI/ISO C99 standard, there are indeed fixed sized types. Thus, a compiler which provides a "uint32_t" in stdint.h must indeed do so with a size of exactly 32 bit.

See Wikipedia for the details in human-readable form, or section 7.18.1.1 of the standard for the definitive "no padding bytes" promise.

Of course, it being standard does not necessarily protect you from compiler-writers taking liberties (and in the process, making their compiler non-compliant); Nor does it protect you from "helpful" library authors who may, in their libraries' header files, "fill in" whatever types the compiler is "missing". But for the last 15 years, the C standard has most definitely provided for a variety of actually-fixed-sized integer types.

Status: Offline

cdimauro

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 20:00:10

[ #128 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@umisef: talking about portability, unfortunately ISO C99 isn't widely adopted. If you want to cover most of the platforms, it's still ANSI C89 which is the most used standard for this language.

Status: Offline

cdimauro

Re: Applied Micro moving away from PowerPC
Posted on 2-Nov-2014 20:33:08

[ #129 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@BigGun

Quote:

BigGun wrote:
CHK

The CHK instruction of the 68K is there to provide checking
of array boundaries and to prevent accidently access of outside an array.

Unfortunately it's a limited instruction, as I've discussed on post #84.

That's also the reason why the BOUND instruction on x86 was practically never used, and abandoned in the end (x64 suppressed it).
Quote:
The 68K does provide this feature on purpose.
This CHK instruction is used by AMIGA Modula and Oberon compilers.

How many programs were written for Modula/-2 or Oberon?

I don't to say that they are not good languages (I was a Pascal fan), but it's objective that they were not that much used, unfortunately.
Quote:
The main reason for Program crashes is accidental
miscalculation of pointer and overwriting of data outside the allowed range.
Oberon programs do not suffer from these errors.

But it wasn't and it isn't a mainstream language.
Quote:
You do not need to CHK every memory access but only those which address is not known to compile time. All addresses which are known by compile time can be checked by the compiler and therefore can be secured. Only those access which are calculated during runtime need to be checked.

Certainly, but it isn't the only case. Unfortunately, many times the addresses are known at runtime.
Quote:
This mean a simple for loop which runs from $START to $END will be checked by the compiler as long as these are constants.

Which isn't the general case: the start and end are not always known at compile time.
Quote:
The CHK instruction is specially optimized for this checking purpose
and will do this more efficiant than doing the same work using normal code.

IF/WHEN you can use it.
Quote:
The CPU is aware that the CHK should normally not trigger and will
therefore not allocate a branch prediction slot for it.
On a proper CPU the CHK instruction takes only a single clock cycle.
Also on a Super-Scalar CPU the CHK instruction could
be executed in parallel in a free slot in the 2nd ALU.
Which means often the CHK could be excuted for 0 cycles.

It also means that you're wasting the possibility to execute another instruction which does a "useful work". So, performance is impacted by the execution of a CHK.
Quote:
If you mind that CHK is not needed for every memory access but only for those that are calculated - and if you consider that CHK can be executed very fast - then you will agree that the overhead of securing a progrma with CHK is very small.

You are putting many conditions here. Can you ensure that it's the common case on the regular code?
Quote:
While Oberon compilers do this CHK securing per default - this can be disabled if wanted.
This means the develop will use CHK during development and has the choice to also create tuned version if desired.

That's what happened with Pascal, which has array index checks enabled by default in the standard. Unfortunately the most used compilers always disabled them, because of the overhead of the checks. So, programs with bugs were deployed to the final users.
Quote:
Some people might ask : "But can I not use the MMU for the same?"
The MMU can spot access to different 4K memory pages.
As you know overwriting an array with a _single_ byte will trash memory.
A CHK will secure the program against these issue - the MMU can not offer the same security.

It's not true. The MMU can offer the same security as of CHK. Take a look at Address Sanitizer, a project developed by Google: http://code.google.com/p/address-sanitizer/
Quote:
OK, MMU is not the same granularity - so it will not catch all errors but at least some.
Is the MMU not at least faster or for free?

No - the MMU is not for free. It comes also for a price.

Yes: https://code.google.com/p/address-sanitizer/wiki/ComparisonOfMemoryTools

It's an high price.
Quote:
Depending how the tagging and translation is done in the CPU it will either take an extra pipeline step or can be done in parallel. If the tagging is done in parallel then each task needs its own MMU table. This means the MMU table needs to be reloaded and swaped on TASK change. Also a MMU table needs to be cached in the CPU using TBL entries these TBL entries need CPU space and are therefore by design limited. If an access misses such entry a table walk in memory needs to be done to reload the entry. This table walk by design is always very slow.

But TLB caches have a very good hit ratios nowadays, right?
Quote:
The price for enabling an MMU is relative small its not like your system runs then at halve speed but the system will loose a few percent performance.
The MMU table will eat a little bit of free memory. The flushing and reloading of table, and the often depending on the CPU design needed flushing of branch prediction entries will make a task switch slower. TLB entries will not be needed to reloaded on every access - but if this is needed they will create a slowdown.

Which isn't that big, with modern processors which have tens of GB/s of bandwidth and billions cycles per second.
Quote:
Which program will run faster:
A program using CHK or a program using the MMU?

This is hard to answer.
Both have some overhead. In both cases the overhead is small.

It's huge, unfortunately. You can take a look at the benchmark which I posted before.

That's why hardware acceleration for boundary checks are proposed. Intel introduced MPX recently, and you can take a look here at it: https://code.google.com/p/address-sanitizer/wiki/IntelMemoryProtectionExtensions
However MPX is not comparable to the CHK scenario, because it offers a more complete solution, and give the possibility to track and check buffers usages.

Why there's so much demand for hardware acceleration of boundary checks? Because it's STRONGLY needed by the industry, which wants something better than the limited CHK scope. Another proposal is here: https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerInHardware
There are other proposals, and personally I've worked to a brand new one which is currently under review.

So, Gunnar, the situation is not that simple as you can imagine.
Quote:
One thing is clear the MMU is not able to spot short overwrites and therefpre does not offer the same protection.

No, the MMU can offer the same protection.

Last edited by cdimauro on 02-Nov-2014 at 08:35 PM.
Last edited by cdimauro on 02-Nov-2014 at 08:34 PM.

Status: Offline

Samurai_Crow

Re: Applied Micro moving away from PowerPC
Posted on 4-Nov-2014 6:01:31

[ #130 ]

Elite Member

Joined: 18-Jan-2003
Posts: 2320
From: Minnesota, USA

@cdimauro

Quote:

cdimauro wrote:
@umisef: talking about portability, unfortunately ISO C99 isn't widely adopted. If you want to cover most of the platforms, it's still ANSI C89 which is the most used standard for this language.

C99 is hardly accepted and now ISO C 2011 is out and there's hardly any compilers for it! C is becoming a dead language and it is just starting to get good. The 1989 version stinks by comparison. Is anyone interested in type-safe macros?

Status: Offline

michalsc

Re: Applied Micro moving away from PowerPC
Posted on 4-Nov-2014 7:32:56

[ #131 ]

AROS Core Developer

Joined: 14-Jun-2005
Posts: 377
From: Germany

@cdimauro

Quote:
unfortunately ISO C99 isn't widely adopted. If you want to cover most of the platforms, it's still ANSI C89 which is the most used standard for this language.

For many years I haven't found any compiler[*] without stdint.h or cstdint headers. And these to guarantee me that int8_t, int16_t, int32_t and int64_t are 8, 16, 32 and 64 bits wide, respectively.

[*] If I would find a compiler not supporting these, I would definitely try to not use it at all ;)

Status: Offline

BigGun

Re: Applied Micro moving away from PowerPC
Posted on 4-Nov-2014 11:15:36

[ #132 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

@cdimauro

Quote:

Quote:
Some people might ask : "But can I not use the MMU for the same?"
The MMU can spot access to different 4K memory pages.
As you know overwriting an array with a _single_ byte will trash memory.
A CHK will secure the program against these issue - the MMU can not offer the same security.

It's not true. The MMU can offer the same security as of CHK.

If you use the MMU to spot coding bugs
then the MMU will slow down the system a few percent this way.
But it can by design not find all wrong memory accesses - only some of them.

CHK has compareable cost - and is better able to spot wrong memory access.

Lets not waste time here to talk about opproaches which waste insane amounts of CPU power.

Quote:

Quote:
The price for enabling an MMU is relative small its not like your system runs then at halve speed but the system will loose a few percent performance.
The MMU table will eat a little bit of free memory. The flushing and reloading of table, and the often depending on the CPU design needed flushing of branch prediction entries will make a task switch slower. TLB entries will not be needed to reloaded on every access - but if this is needed they will create a slowdown.

Which isn't that big, with modern processors which have tens of GB/s of bandwidth and billions cycles per second.

Is the analysation, right?

You can also look at it from this angle:
If you have a CPU running a Gigaherz
then a single memory access does cost 200 or 300 cycles.
To refetch a TBL you need to table walk and do several memory acccess.
This means a TBL miss will have HUGE costs.
You can execute hundreds of CHK instructions for single TBL miss.

How costly a CHK will depend on each core internals.
The advantage of CHK is that its inside the code.
And code is mostly linear in memory, this means code can be streamed in.
If your core has enough Icache bandwidth than CHK comes cheap.

The 68K CHK is better than doing the same with "normal" instructions.
The CHK instruction is shorter and can be excuted faster "normal" comparisions on other cores .

Last edited by BigGun on 04-Nov-2014 at 11:41 AM.

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle