Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

21 crawler(s) on-line.

85 guest(s) on-line.

3 member(s) on-line.

kolla,

BigD,

cip060

You are an anonymous user.
Register Now!

cip060: 3 mins ago

kolla: 4 mins ago

BigD: 4 mins ago

pixie: 43 mins ago

VooDoo: 54 mins ago

thomas: 1 hr 18 mins ago

amigakit: 1 hr 53 mins ago

OlafS25: 2 hrs 16 mins ago

Rob: 2 hrs 30 mins ago

matthey: 2 hrs 39 mins ago

Forum Index

Amiga General Chat

Can more people becomes a productive Amiga community member?

Poster

Thread

ppcamiga1

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 6:43:33

[ #61 ]

Cult Member

Joined: 23-Aug-2015
Posts: 777
From: Unknown

@matthey

ppc is fast enough so nobody care about performance loses on MMU

Status: Offline

Gunnar

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 7:39:03

[ #62 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@cdimauro

Quote:

OK, but even a column-based rendering doesn't change much the situation.

FACT: The situation is totally different with row or column rendering.

Quote:

Second, hardware prefetchers can easily intercept those memory accesses regular patterns. If not, you can always use prefetch instructions to load in advance the pages that will be processed.

You misunderstand this.
We are talking NOT about Data Cache misses.
We are talking about ATC-misses. ATC are NOT Data-cache.

Quote:

Third, you don't swap pages if you've enough memory (which is the case). So, you only have to map physical pages to the specific virtual pages.

Our problem not talks about swapping.

Quote:

Fourth, you can preload all pages used in the framebuffer, so that the pages used on the first two levels of indirection are already loaded when the rendering starts. This way only the pages at the last level need to be mapped each time, when the rendering happens.

You can not do this.
The problem is that the MMU is flushed out 3 times per column you render.
This mean you flush /reload the complete MMU over 3000 times per screen
This is killing the system performance.
The MMU flushes also make the MMU loose all content on texture that it wants to load.
This means not only write are slow but also texture access.

Quote:

Fifth... if you've full control of the system, then you can carefully manipulate the MMU descriptors in advance while the rendering happens, so that the impact becomes basically neglegible.

The problem is bigger than your MMU = you can not solve this.
This example is a case where you juggle with more apples than you can handle.

Quote:

So, there are ways to improve the performance even when the MMU is enable with the regular 4k pages. At least for the normal cases, where the access to the memory is NOT totally random, rather quite regular.

You mix up DCache and MMU here.
MMU entries are never prefetched.
This does not work how you think.

OK, I see that I take to much knowledge about game programming and CPU for granted.
I see that I need to explain where the problem is:

Lets explain DOOM and the 68040 CPU.
For illustration lets say we run in 1024 game size.

The MMU has 64 ATC, the MMU can map 256 KB (using 4K pages, default mode)
Each column of the screen goes over 192 ATC - 3 times more than our MMU has!

The game needs to load Texture-Data.
The game wants to keep as much Texture as possible in the CPU game.
The game will write to Screen.
The game does NOT want to cache the Screen.
The game want to set the 192 MMU pages of the screen to writethought.

The problem is the screen needs more MMU entries than the MMU has.
This means while painting just 1 row.
The MMU entries are all reload and flushed 3 times.
This makes the writing very slow.
This alow makes the MMU loose all MMU entries of the texture memory.
This means the relaoding/flushing of the MMU entries makes not only the write
to stutter but also will slow down the texture loading.

I hope you understand the problem now.
And this example is clear to you now.

Lets talk again about the main topic:
The main topic was that today's use cases often use a lot of memory.
A lot more memory than the 040/060 MMU can handle well.

One could argue - lets make the MMU 8 times fatter. Give it 512 ATC
Yes this would help.
But the price is high.
Do we really want to waste so much chip space on a not efficient MMU?

One can also argue, lets think about smart and elegant solution to make the MMU more efficient,
so that the MMU can handle more memory in a better way without wasting huge amount resources.

I like the idea of using an elegant solution.
In my opinion the Amiga always was about smart solutions.
Which allowed the Amiga thanks to the smart Copper concept to show more colors in games.

Last edited by Gunnar on 23-Feb-2024 at 08:00 AM.

Status: Offline

kolla

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 7:49:05

[ #63 ]

Elite Member

Joined: 21-Aug-2003
Posts: 2917
From: Trondheim, Norway

AmigaOS has full process separation, each process has its own computer.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Online!

Gunnar

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 7:50:06

[ #64 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@ppcamiga1

Quote:
ppc is fast enough so nobody care about performance loses on MMU

The opposite to what you say is true.

IBM cares A LOT about making the MMU better and about avoiding MMU performance losses.
As I explained before IBM did a lot studies how much performance MMU loose and collected idea how to improve this with a better MMU in the future.

You need to understand that MMU tables are in memory and memory has a latency.
As higher your CPU clock is as more costly is this latency.
This means memory loads are slow.
An MMU miss on high clocked PowerPC does cost you hundreds of CPU cycles!

Status: Offline

Karlos

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 12:54:59

[ #65 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@kolla

Quote:

kolla wrote:
AmigaOS has full process separation, each process has its own computer.

That's a quotable.

_________________
Doing stupid things for fun...

Status: Offline

Karlos

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 12:58:44

[ #66 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4405
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

Quote:

@ppcamiga1

The opposite to what you say is true

That's often the case. I've come to the belief that he's a troll plant, whose job is to make sure people stay away from PPC through the fear of association.

_________________
Doing stupid things for fun...

Status: Offline

ppcamiga1

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 18:13:49

[ #67 ]

Cult Member

Joined: 23-Aug-2015
Posts: 777
From: Unknown

@kolla

thats brilliant

Status: Offline

ppcamiga1

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 18:16:53

[ #68 ]

Cult Member

Joined: 23-Aug-2015
Posts: 777
From: Unknown

let's say it another way
mmu in ppc works so well
that I have never heard of anyone wanting to turn it off to gain speed

Last edited by ppcamiga1 on 23-Feb-2024 at 06:17 PM.

Status: Offline

cdimauro

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 18:41:59

[ #69 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@cdimauro

Quote:

OK, but even a column-based rendering doesn't change much the situation.

FACT: The situation is totally different with row or column rendering.

Not totally: a bit different. I'll explain it better below.
Quote:
Quote:

Second, hardware prefetchers can easily intercept those memory accesses regular patterns. If not, you can always use prefetch instructions to load in advance the pages that will be processed.

You misunderstand this.
We are talking NOT about Data Cache misses.
We are talking about ATC-misses. ATC are NOT Data-cache.

The two things are related/linked, for obvious reasons. In fact, before loading a line cache from memory you should have already "resolved" its TLB entry.

So, working with the cache also might involve the TLB entries (depending on the context), and this is also about the prefetch logic (both hardware or software: it doesn't matter).

Let me report something from Intel's architecture manual (the third one: System programming):

4.10.2.3 Details of TLB Use
Subject to the limitations given in the previous paragraph, the processor may cache a translation for any linearaddress, even if that address is not used to access memory. For example, the processor may cache translations required for prefetches and for accesses that result from speculative execution that would never actually occur inthe executed code path.

I've highlighted the relevant parts. That should be clear enough, right?
Quote:
Quote:

Third, you don't swap pages if you've enough memory (which is the case). So, you only have to map physical pages to the specific virtual pages.

Our problem not talks about swapping.

I've written it just for completeness.
Quote:
Quote:

Fourth, you can preload all pages used in the framebuffer, so that the pages used on the first two levels of indirection are already loaded when the rendering starts. This way only the pages at the last level need to be mapped each time, when the rendering happens.

You can not do this.
The problem is that the MMU is flushed out 3 times per column you render.
This mean you flush /reload the complete MMU over 3000 times per screen
This is killing the system performance.
The MMU flushes also make the MMU loose all content on texture that it wants to load.
This means not only write are slow but also texture access.

See above: TLB entries are flushed once a new one is needed, but it's transparent thanks to the prefetch logic and doesn't impact much on the performance, since the you've just one level to walk when the next TLB entry is missing, and this can be done in parallel (even with in-order microarchitectures) while the game is processing the current line.

If the hardware prefetcher doesn't work well (it might happen) you can always the software prefetch instructions (which are "for free" on some microarchitectures), and maybe combined with the instructions that invalidate a specific TLB.

So, and to simplify, let's say that you need 4 lines to be mapped with their TLB entries before you start processing the first line, because your processor takes a certain amount of time to walk the last level of the page hierarchy and finally map the proper entry.
Then your game logic should be the following, when you start the rendering of the first line of the first column (and so on):

1) "touch" (proper prefetch instruction) the memory location of framebuffer[x, y], framebuffer[x, y + 1], framebuffer[x, y + 2]
2) invalidate the memory location at framebuffer[x, y - 1]  # This will free its TLB entry.
3) "touch" framebuffer[x, y + 3]  # This probably allocates a TLB entry
4) start processing framebuffer[x, y]
5) y += 1
6) goto 2.

This minimizes / optimizes the usage of TLB entries for the rending, because you only need 4 of them mapped for the rendering of the column. All others can be used for game's business logic and for the textures (as much it could be).

I hope that it's clear.
Quote:

Quote:

Fifth... if you've full control of the system, then you can carefully manipulate the MMU descriptors in advance while the rendering happens, so that the impact becomes basically neglegible.

The problem is bigger than your MMU = you can not solve this.
This example is a case where you juggle with more apples than you can handle.

See above: you need only a few "apples".
Quote:

Quote:

So, there are ways to improve the performance even when the MMU is enable with the regular 4k pages. At least for the normal cases, where the access to the memory is NOT totally random, rather quite regular.

You mix up DCache and MMU here.
MMU entries are never prefetched.
This does not work how you think.

Maybe it's the 68080 works this way, but see above Intel's documentation.
Quote:

OK, I see that I take to much knowledge about game programming and CPU for granted.
I see that I need to explain where the problem is:

Lets explain DOOM and the 68040 CPU.
For illustration lets say we run in 1024 game size.

The MMU has 64 ATC, the MMU can map 256 KB (using 4K pages, default mode)
Each column of the screen goes over 192 ATC - 3 times more than our MMU has!

The game needs to load Texture-Data.
The game wants to keep as much Texture as possible in the CPU game.
The game will write to Screen.
The game does NOT want to cache the Screen.
The game want to set the 192 MMU pages of the screen to writethought.

The problem is the screen needs more MMU entries than the MMU has.
This means while painting just 1 row.
The MMU entries are all reload and flushed 3 times.
This makes the writing very slow.
This alow makes the MMU loose all MMU entries of the texture memory.
This means the relaoding/flushing of the MMU entries makes not only the write
to stutter but also will slow down the texture loading.

I hope you understand the problem now.
And this example is clear to you now.

The example was clear already before, but my perception is that it wasn't clear enough what I've written before. I hope that now it is.

BTW, the data cache is on a much worse condition compared to the MMU. In fact, you only have 32kB (usually), where 64 TLB entries can map 256kB.
Quote:

Lets talk again about the main topic:
The main topic was that today's use cases often use a lot of memory.
A lot more memory than the 040/060 MMU can handle well.

One could argue - lets make the MMU 8 times fatter. Give it 512 ATC
Yes this would help.
But the price is high.
Do we really want to waste so much chip space on a not efficient MMU?

No, and that's also what modern processors are doing: they have not so many TLB entries on the TLB cache, because they have also split them per-usage.

In fact, TLB caches usually have much more entries (because around 1 instruction every 10 is a branch and code execution needs to pay more attention to this case compared to missing entries for the data). And there are specific TLB entries for larger pages.
Quote:

One can also argue, lets think about smart and elegant solution to make the MMU more efficient,
so that the MMU can handle more memory in a better way without wasting huge amount resources.

There are bigger pages available which can be mixed with the usual 4kB pages.
Quote:

I like the idea of using an elegant solution.
In my opinion the Amiga always was about smart solutions.
Which allowed the Amiga thanks to the smart Copper concept to show more colors in games.

In this case you can knock at Intel's Architecture committee chairman's door (it was Erik Metzger at the time) and ask if after 10 years the company has plans to finally file a patent for the invention request #120345.
And, if not, then if the (only) author can do it on his own.

This invention is the smart solution to the topic (but only as a "side effect": its primary purpose was a completely different one), as well as other areas (can greatly improve performance on some common use cases of nowadays, for example).

Status: Offline

matthey

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 19:13:58

[ #70 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2024
From: Kansas

cdimauro Quote:

@matthey: full support isn't possibile with the Amiga OS, because it requires a complete redesign of its foundations.

Shared memory can be protected with a MMU/MPU which is memory protection. For example, shared Kickstart/ROM and code in fast memory can be protected from stores. What memory should be protected and how is up to the developers and/or user if there is modular optional MMU support, hardware permitting. Some developers may claim AmigaOS 4 does not have full memory protection when lack of process isolation is the issue. Full process isolation is more difficult for AmigaOS. It looks like AmigaOS 4 was designed to allow partial process isolation but it was never enabled. Enabling process isolation would reduce performance and may decrease compatibility but it is surprising there is no option to enable and debug it. New AmigaOS APIs are needed for shared memory with full process isolation. Full memory protection is already available in AmigaOS 4 though. Even ThoR's MMU libraries provide the option of full memory protection in the 68k AmigaOS even though it is rarely used and poorly integrated into the AmigaOS.

cdimauro Quote:

Regarding the rest, the MMU is useful, but yes: you've a price to pay. Adding much more TLB entries help to reduce this cost.

However, it can still be used on real-time system if you "lock" the TLB entries of the code and data pages which are needed by the critical routines.

There needs to be enough TLB/ATC entries to lock for the amount of memory used. Caches may need to be locked too. One programmer mistake resulting in a MMU table walk in memory could be deadly for real time embedded use. The reliability gained from using a MMU could easily be outweighed by the risk of a programmer mistake locking caches. Small real time embedded systems that don't use much memory are often better off using a MCU with MPU and SRAM memory. Large real time embedded systems using large amounts of memory are challenging to develop. The 68k AmigaOS is designed like a RTOS even though it doesn't use a MCU. Most 68k cores used were more like Cortex-M cores often used in a MCU than a Cortex-A core with MMU. A 68020+AGA+2MiB standard could be used to make a MCU today with 2MiB of SRAM (configured instead of 2MiB L2) like the SiFive SoC with U74 cores. It would still be nice to have a MMU and table walks in SRAM would allow it to be used for more embedded systems. The only disadvantage is a higher cost SoC than small more specialized MCU but transistors are cheap, especially with economies of scale. Versatility is a nice feature if the cost is low enough.

Last edited by matthey on 23-Feb-2024 at 07:26 PM.

Status: Offline

cdimauro

Re: Can more people becomes a productive Amiga community member?
Posted on 23-Feb-2024 21:29:37

[ #71 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey

Quote:

matthey wrote:
cdimauro Quote:

@matthey: full support isn't possibile with the Amiga OS, because it requires a complete redesign of its foundations.

Shared memory can be protected with a MMU/MPU which is memory protection. For example, shared Kickstart/ROM and code in fast memory can be protected from stores. What memory should be protected and how is up to the developers and/or user if there is modular optional MMU support, hardware permitting. Some developers may claim AmigaOS 4 does not have full memory protection when lack of process isolation is the issue. Full process isolation is more difficult for AmigaOS. It looks like AmigaOS 4 was designed to allow partial process isolation but it was never enabled. Enabling process isolation would reduce performance and may decrease compatibility but it is surprising there is no option to enable and debug it. New AmigaOS APIs are needed for shared memory with full process isolation.

Those look like workarounds: you shouldn't need ad hoc APIs to get features like process isolation or full memory protection. They should come already out of the box, as part of the standard execution.

It should be only when you need to share resources between different processes that you need new APIs.
Quote:
Full memory protection is already available in AmigaOS 4 though.

That's big news: are you sure? Never heard before. Sources for this?
Quote:
Even ThoR's MMU libraries provide the option of full memory protection in the 68k AmigaOS even though it is rarely used and poorly integrated into the AmigaOS.

I don't think that a library can implement this. How does it work?
Quote:
cdimauro Quote:

Regarding the rest, the MMU is useful, but yes: you've a price to pay. Adding much more TLB entries help to reduce this cost.

However, it can still be used on real-time system if you "lock" the TLB entries of the code and data pages which are needed by the critical routines.

There needs to be enough TLB/ATC entries to lock for the amount of memory used. Caches may need to be locked too.

TLB entries and caches are strictly linked, as I've said before, so yes: they need to be locked as well (like the XBox360's CPU allowed).
Quote:
One programmer mistake resulting in a MMU table walk in memory could be deadly for real time embedded use. The reliability gained from using a MMU could easily be outweighed by the risk of a programmer mistake locking caches.

That's expected, since we're talking about embedded systems. You already need to be extremely careful, even without a TLB/caches lock mechanism.
Quote:
Small real time embedded systems that don't use much memory are often better off using a MCU with MPU and SRAM memory.

Yes. As usual, it depends on what are the project(s) goals. Then you can propose solutions that better fit selecting from the various options available.
Quote:
Large real time embedded systems using large amounts of memory are challenging to develop.

Maybe they require creative solutions.
Quote:
The 68k AmigaOS is designed like a RTOS even though it doesn't use a MCU. Most 68k cores used were more like Cortex-M cores often used in a MCU than a Cortex-A core with MMU.

The AmigaOS isn't a RTOS, because it's impossible to guarantee an execution time for the APIs execution. Especially considering very dirty stuff like Forbid()/Disable().
Quote:
A 68020+AGA+2MiB standard could be used to make a MCU today with 2MiB of SRAM (configured instead of 2MiB L2) like the SiFive SoC with U74 cores. It would still be nice to have a MMU and table walks in SRAM would allow it to be used for more embedded systems. The only disadvantage is a higher cost SoC than small more specialized MCU but transistors are cheap, especially with economies of scale. Versatility is a nice feature if the cost is low enough.

Sure, that could be appealing.

But the only question is: why still sticking with the AGA? It's junk: a horrible patch over the ECS (which was already another ugly patch over OCS)?

Last but not really least, bitplanes are inefficient and proposing them again nowadays is anachronistic: that's the first thing to remove and replace it with packed/chunky graphics.

Status: Offline

matthey

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 2:06:04

[ #72 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2024
From: Kansas

cdimauro Quote:

That's big news: are you sure? Never heard before. Sources for this?

AmigaOS 4 can use the MMU wherever desirable so full memory protection is available. It likely has partial process isolation as well.

https://wiki.amigaos.net/wiki/Exec_Memory_Allocation#Exec_Memory_Allocation Quote:

Prior to AmigaOS 4.0, the OS did not make use of the CPU's memory management unit and used memory "as-is". That is, if you have different memory expansions plugged into your system, the memory will be seen as chunks located somewhere in the 4 gigabyte address space. Since version 4.0, the MMU will be used to "map" memory pages from their physical location to a virtual address. There are multiple reasons why this is better than using the verbatim physical addresses - among other things it reduces the effect of "memory fragmentation" and simplifies the possibility to swap currently unused memory pages to persistent storage such as a hard disk.

There are limited places where memory protection is worthwhile without process isolation.

cdimauro Quote:

I don't think that a library can implement this. How does it work?

ThoR's mmu.library provides manual memory protection where automatic memory protection with process isolation performed by the OS is what developers may associate with memory protection.

http://aminet.net/package/util/libs/MMULib Quote:

The mmu.library is a basis for MMU (memory management) related functions the
MC68K family can perform. Up to now certain hacks are available that program
the MMU themselves (Enforcer,CyberGuard,GuardianAngle,SetCPU,Shapeshifter,
VMM,GigaMem...).
It's therefore not unexpected that these tools conflict with each other.
There's up to now no Os support for the MMU at all - the gap this mmu.library
fills.

The goal is to provide a basis of functions to address and program the MMU in
a hardware independent, Os friendly fashion. Hence, the new version of the
Enforcer, called MuForce, will work together with virtual memory, and others.

The mmu.library is also the basis for the virtual memory project, the
memory.library. Even though the mmu.library does not provide virtual memory
itself, it builds the basics to allow an easy implementation and to avoid the
hacks required by other implementations so far.

The memory.library is now complete and can be found in this archive.

The mmu.library is unfortunately not the official MMU standard so there could be others. The 68k AmigaOS could use this standard and make it standard.

cdimauro Quote:

The AmigaOS isn't a RTOS, because it's impossible to guarantee an execution time for the APIs execution. Especially considering very dirty stuff like Forbid()/Disable().

Debatable. The 68k AmigaOS is not certified as a RTOS and no response timings are specified. However, it has been used as a RTOS for successful embedded systems on low end hardware.

cdimauro Quote:

Sure, that could be appealing.

But the only question is: why still sticking with the AGA? It's junk: a horrible patch over the ECS (which was already another ugly patch over OCS)?

Last but not really least, bitplanes are inefficient and proposing them again nowadays is anachronistic: that's the first thing to remove and replace it with packed/chunky graphics.

I just specified the 2nd major Amiga standard which included AGA.

68000+OCS/ECS+512kiB/1MiB
68020+AGA+2MiB

I would want a higher CPU and chipset spec than this too. My point was that 2MiB of SRAM would be required without external memory to meet major standard Amiga specs. This spec allows most AmigaOS software to run. Some Amiga programs may run too fast from SRAM though .

Status: Offline

agami

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 2:35:02

[ #73 ]

Super Member

Joined: 30-Jun-2008
Posts: 1663
From: Melbourne, Australia

@ppcamiga1

Quote:
ppcamiga1 wrote:
let's say it another way
mmu in ppc works so well
that I have never heard of anyone wanting to turn it off to gain speed

The syntax in this post, and the use of the word "brilliant" in the previous post, is very off brand for the real @ppcamiga1.
Suspected for a while, but these latest slip-up entries lead me to the conclusion that this account has been taken over by someone else, spurring discourse through friction.

The jig is up.

Note: The possibility that @ppcamiga1 started using Grammarly did cross my mind, but hat too would be off brand for him. He almost took pride in having imperfect command of the written English language, which I feel stems from some misguided sense of superiority in the Polish variant of Slavic. A language second only to Russian, being in dire need of a reformation.

_________________
All the way, with 68k

Status: Offline

FairBoy

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 4:56:25

[ #74 ]

Member

Joined: 8-Jun-2020
Posts: 76
From: Unknown

@agami
Unfortunately it's fairly obvious that your account has not been taken over by somebody else.

Status: Offline

cdimauro

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 6:53:00

[ #75 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@matthey

Quote:

matthey wrote:
cdimauro Quote:

That's big news: are you sure? Never heard before. Sources for this?

AmigaOS 4 can use the MMU wherever desirable so full memory protection is available. It likely has partial process isolation as well.

https://wiki.amigaos.net/wiki/Exec_Memory_Allocation#Exec_Memory_Allocation Quote:

Prior to AmigaOS 4.0, the OS did not make use of the CPU's memory management unit and used memory "as-is". That is, if you have different memory expansions plugged into your system, the memory will be seen as chunks located somewhere in the 4 gigabyte address space. Since version 4.0, the MMU will be used to "map" memory pages from their physical location to a virtual address. There are multiple reasons why this is better than using the verbatim physical addresses - among other things it reduces the effect of "memory fragmentation" and simplifies the possibility to swap currently unused memory pages to persistent storage such as a hard disk.

There are limited places where memory protection is worthwhile without process isolation.

In fact, AmigaOS4 provides no protection at all, looking at the documentation that you've shared: it was just about virtual memory (e.g. mapping virtual addresses to physical ones).

Here's the relevant part, which just followed what you've reported above:

It is important to remember that, just like in classic AmigaOS, a single address space is used for all programs. Sometimes the mention of an MMU can lead people to assume that each process on the Amiga will have its own personal, partitioned address space. The following two programs demonstrate that, even though they are separate processes, it is possible to read and write another's memory. The memory locations are the same virtual address and that virtual address maps onto the same physical address.

So, really nothing. Even the memory allocated for keeping the executable code doesn't show any protection, and it's also non-swappable:

MEMF_EXECUTABLE
The memory is used to store executable PowerPC code. This is used two-fold in AmigaOS. First, it allows the system to determine if a function pointer points to real native PowerPC code as opposed to 68k code which needs to be emulated. Second, it prevents common exploits that use stack overflows to execute malicious code. Executable memory is locked by default and thus is not swappable.

Then the only protection seems to be against accessing memory which wasn't allocated. A good thing, for sure, but very very little.
Quote:
cdimauro Quote:

I don't think that a library can implement this. How does it work?

ThoR's mmu.library provides manual memory protection where automatic memory protection with process isolation performed by the OS is what developers may associate with memory protection.

http://aminet.net/package/util/libs/MMULib Quote:

The mmu.library is a basis for MMU (memory management) related functions the
MC68K family can perform. Up to now certain hacks are available that program
the MMU themselves (Enforcer,CyberGuard,GuardianAngle,SetCPU,Shapeshifter,
VMM,GigaMem...).
It's therefore not unexpected that these tools conflict with each other.
There's up to now no Os support for the MMU at all - the gap this mmu.library
fills.

The goal is to provide a basis of functions to address and program the MMU in
a hardware independent, Os friendly fashion. Hence, the new version of the
Enforcer, called MuForce, will work together with virtual memory, and others.

The mmu.library is also the basis for the virtual memory project, the
memory.library. Even though the mmu.library does not provide virtual memory
itself, it builds the basics to allow an easy implementation and to avoid the
hacks required by other implementations so far.

The memory.library is now complete and can be found in this archive.

The mmu.library is unfortunately not the official MMU standard so there could be others. The 68k AmigaOS could use this standard and make it standard.

Well, this library offers no protection at all "per sÃ©": it's just a collection of functions for controlling the MMU (taking into account the different models that we have: thanks, Motorola!).

This has nothing to do with an OS which implements some form of memory protection, because what to do it's entirely demanded to the applications -> Bad-by-Definition.
Quote:
cdimauro Quote:

The AmigaOS isn't a RTOS, because it's impossible to guarantee an execution time for the APIs execution. Especially considering very dirty stuff like Forbid()/Disable().

Debatable. The 68k AmigaOS is not certified as a RTOS and no response timings are specified. However, it has been used as a RTOS for successful embedded systems on low end hardware.

Well, the fact that it has been used for some "mission critical" projects doesn't mean that the Amiga OS is a RTOS. As you stated, there's no guaranteed response time of its APIs, so it's definitely not a RTOS by definition.

And it would never get a certification like that, with the nasty APIs which we know.
Quote:
cdimauro Quote:

Sure, that could be appealing.

But the only question is: why still sticking with the AGA? It's junk: a horrible patch over the ECS (which was already another ugly patch over OCS)?

Last but not really least, bitplanes are inefficient and proposing them again nowadays is anachronistic: that's the first thing to remove and replace it with packed/chunky graphics.

I just specified the 2nd major Amiga standard which included AGA.

68000+OCS/ECS+512kiB/1MiB
68020+AGA+2MiB

I would want a higher CPU and chipset spec than this too. My point was that 2MiB of SRAM would be required without external memory to meet major standard Amiga specs. This spec allows most AmigaOS software to run. Some Amiga programs may run too fast from SRAM though .

OK, you wanted to provide some "baseline" by just using some common Amiga specs. Fair enough.

Status: Offline

Gunnar

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 7:04:55

[ #76 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@cdimauro

Cesare Di Mauro,

Lets talk about your idea.

Using TOUCH instruction on the game Screen as you proposed
would result in the game loading both the ATC, and loading the Screen D-Cache line.
This is not good.

Let me explain you the game design again.
The game want to use its D-Cache for the texture only.
The game want to only write to the screen.

You proposal works against the game design.
In your proposal the game would use its cache to cache the screen.
This would work against the cache being able to cache the texture. = BAD

In your proposal the game would load huge aomunts of screen data
which are not needed that would slow the game down a lot.

Lets do the math:
Normal game design:

Amount of Screen written = 700 KB
Amount of Screen loaded = 0 Byte

You idea game :

uses touch to load one Cache line per Pixel it writes.
This mean 700,000 Pixel * Cache line size = are loaded without needed.
Amount of Screen written = 700 KB
Amount of Screen loaded = 25,000 KB

You idea makes the memory access increase from 700 KB to 25700 KB per frame.

I assume you see yourself this does not work.

Status: Offline

cdimauro

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 7:28:05

[ #77 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@cdimauro

Cesare Di Mauro,

Lets talk about your idea.

Using TOUCH instruction on the game Screen as you proposed
would result in the game loading both the ATC, and loading the Screen D-Cache line.
This is not good.

Let me explain you the game design again.
The game want to use its D-Cache for the texture only.
The game want to only write to the screen.

You proposal works against the game design.
In your proposal the game would use its cache to cache the screen.
This would work against the cache being able to cache the texture. = BAD

In your proposal the game would load huge aomunts of screen data
which are not needed that would slow the game down a lot.

Lets do the math:
Normal game design:

Amount of Screen written = 700 KB
Amount of Screen loaded = 0 Byte

You idea game :

uses touch to load one Cache line per Pixel it writes.
This mean 700,000 Pixel * Cache line size = are loaded without needed.
Amount of Screen written = 700 KB
Amount of Screen loaded = 25,000 KB

You idea makes the memory access increase from 700 KB to 25700 KB per frame.

I assume you see yourself this does not work.

I think that we need to align on some concepts here, and the most important one is: if the game needs to write to the screen, then it has to access the memory location where to write and this requires also load its data on a cache line.

So, you can't use the entire data cache solely for loading the data from the textures. And you cannot definitely do it, because the code will access some other data besides the texture (at least the data structures that hold the information for the textures and the geometry of the scene, of course).

This clarified, let's focus on the renderer that I've explained before.

Assuming a 16-byte data cache line, with a 32kB data cache you've 2048 lines that can hold some data from the memory. Taking a 320x200 screen (just an example), it means that for rendering a column you need to load 200 * 16 = 3200 bytes from the screen.

However, you still use only 4 cache lines for it and 4 TLB entries.

The "only thing" is that, yes, you need to load 3200 byte for each column to be rendered, that with a 320x200 screen means 1MB of data read to complete the scene.

But... this is unavoidable! If a game uses a column-based rendering, then there's no other way!

I mean: without using tricks like the byte selection mask on DDR memory (which was NOT available when games like Wolfestein3D and Doom were developed).

Did I still miss something?

Status: Offline

Gunnar

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 8:04:06

[ #78 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@cdimauro

Quote:
I think that we need to align on some concepts here, and the most important one is: if the game needs to write to the screen, then it has to access the memory location where to write and this requires also load its data on a cache line.

No this is not true.

The 68030 cache and earlier 68k always write to memory without reading it.
The NEVER read from memory that you want to write.

The 68040 and 060 cache can run in 2 modes:
- You can like 030 only write without reading.
- You can run the cache mode (WRITEBACK) where it has to preload the line.

Using the 030 mode write without reading has MAJOR advantages for many games.
Normally the GFX card memory is on purpose set to this mode.
Very often Zorro cards are set to this mode.

Quote:

However, you still use only 4 cache lines for it and 4 TLB entries.

No - This is not how ATC and CPU Caches work!

What would help you a lot would the understanding how a CPU Data-Cache is working.

A CPU Data-Cache is made from many "lines"
Lets say for example 1024 lines.
Lets say each line is 32 Byte.
This is called a Way.

Cache can have 1 way- or sometime several.
Commonly 1-way, 2-Way, or 4-Way, sometimes 8Way

The memory address is directly associated with the Cache line in the way.
This means the ending on the address does define which line of the Way it uses.

The point of this design is that the Cache only needs to do a limited amount if parallel compares per Cache access.

Imagine you would not have this design.
Lets say you would have NO way - and 4000 cache Lines.
To find the match your cache would need to do 4000 parallel CMP operation - each cycle.

This Cache check logic would be 100 times bigger than you ALU !!
This would be an impossible ridiculous design.

So in reality what happens is you run over the different memory addresses
and each different address will per design access a new line.
And because of this CPU will loose all the DCache content.

Status: Offline

cdimauro

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 8:39:13

[ #79 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@cdimauro

Quote:
I think that we need to align on some concepts here, and the most important one is: if the game needs to write to the screen, then it has to access the memory location where to write and this requires also load its data on a cache line.

No this is not true.

The 68030 cache and earlier 68k always write to memory without reading it.
The NEVER read from memory that you want to write.

The 68040 and 060 cache can run in 2 modes:
- You can like 030 only write without reading.
- You can run the cache mode (WRITEBACK) where it has to preload the line.

Using the 030 mode write without reading has MAJOR advantages for many games.
Normally the GFX card memory is on purpose set to this mode.
Very often Zorro cards are set to this mode.

I think that the main problem here is that we completely missed the context while discussing.

Everything started with this: https://amigaworld.net/modules/newbb/viewtopic.php?mode=viewtopic&topic_id=45171&forum=2&start=40&viewmode=flat&order=0#868116
Which I report here for our convenience:

Well, Intel's 32-bit MMU got some enhancements (2M and 4MB pages, NX), but it's still being used more or less how it was introduced with the 80386: with 4kB pages and 3 levels pages.

4kB pages are used even with x64 (which supports 1GB pages), with 4 or 5 levels pages.

In short: 4kB are the standard on x86/x64, despite bigger page sizes are possible (but you need to compile the specifically), and the performances are still still stellar (because on modern processors there are more TLBs entries AND the data access patterns aren't purely random).

Pay also attention on "modern processors". And the focus was on x86's (P)MMU, which is basically the same that was introduced by Intel with the i386 (BTW, NX cannot be used in 32-bit mode: only 4MB pages mode is available. Just to highlight that, yes: the MMU is working almost as the original one even on 2024).

Now, the question is: are x86 (and x64) processors able to just write to memory without reading their memory in a data cache line? I don't know this detail.

If not, then what I've written is valid.

If yes, then you're right but then the Doom example that you've cited is only partially relevant, since the processor will still use some TLB entries for translating the write address for the screen, because the TLB's eviction logic usually is implemented as LRU.
It means that if textures are often used, then their TLB entries are also the most used ones and then they aren't the ones to be replaced when the processor needs to get a TLB entry for the line to be processed.
In practice, the processor will transparently throw away the few TLB entries which were used for mapping the processed lines.
In short: the impact for the performances is not that high.
Quote:
Quote:

However, you still use only 4 cache lines for it and 4 TLB entries.

No - This is not how ATC and CPU Caches work!

What would help you a lot would the understanding how a CPU Data-Cache is working.

A CPU Data-Cache is made from many "lines"
Lets say for example 1024 lines.
Lets say each line is 32 Byte.
This is called a Way.

Cache can have 1 way- or sometime several.
Commonly 1-way, 2-Way, or 4-Way, sometimes 8Way

The memory address is directly associated with the Cache line in the way.
This means the ending on the address does define which line of the Way it uses.

The point of this design is that the Cache only needs to do a limited amount if parallel compares per Cache access.

Imagine you would not have this design.
Lets say you would have NO way - and 4000 cache Lines.
To find the match your cache would need to do 4000 parallel CMP operation - each cycle.

This Cache check logic would be 100 times bigger than you ALU !!
This would be an impossible ridiculous design.

So in reality what happens is you run over the different memory addresses
and each different address will per design access a new line.
And because of this CPU will loose all the DCache content.

It doesn't matter much, because of the LRU eviction algorithm: you'll always throw & reload 4 (in my example) TLB entries at the time.

Status: Offline

Gunnar

Re: Can more people becomes a productive Amiga community member?
Posted on 24-Feb-2024 10:35:16

[ #80 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@cdimauro

Quote:
LRU eviction algorithm

I can see that you think that the Cache works like this but unfortunately you not understand how CPU caches work.

Quote:
Now, the question is: are x86 (and x64) processors able to just write to memory without reading their memory in a data cache line? I don't know this detail.

This general problem here is: talking about topics and arguing about topics with ZERO personal experience

This is always the same problem.

Some people here try to argue about topics without having any practical coding experience and without practical knowledge of how the CPU works internally.

I call this :
ARMCHAIR QUARTERBACK syndrome

Reading or browsing through an CPU manual is: OK.
Googling infos is:OK

But without some background knowledge:
- not understanding how MMU work internally.
- not knowing how caches work.
- not fully knowing how a CPU works.
- not knowing how games are coded.

Then google and Wikipedia only give you dangerous half-knowledge

I have all the knowledge that you seek.
- I have build such CPUs.
- I know how the MMU works internally.
- I know how caches work internally
- I can offer you to explain in detail how all about how a CPU works.

But arguing here with halve knowledge and halve misunderstandings and halve wrong assumptions is very tiring.

Can we agree that you simply ASK if you want to know something ?

Can we also agree to use one naming and not 2 at the same time for things?
Motorola names MMU entries ATC and not TLB. Can we agree to not mix up Moto and INTEL naming.
I think this makes the reading then a lot clearer.

Thank you

Last edited by Gunnar on 24-Feb-2024 at 10:40 AM.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle