Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6223 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

amigakit

You are an anonymous user.
Register Now!

amigakit: 54 secs ago

mbrantley: 5 mins ago

DiscreetFX: 14 mins ago

nbache: 33 mins ago

Mobileconnect: 2 hrs 19 mins ago

number6: 2 hrs 28 mins ago

BigD: 2 hrs 39 mins ago

Trixie: 2 hrs 47 mins ago

OneTimer1: 2 hrs 49 mins ago

minator: 3 hrs 7 mins ago

Forum Index

Amiga General Chat

For @ppcamiga1: Why MUI?

Poster

Thread

matthey

Re: For @ppcamiga1: Why MUI?
Posted on 3-Mar-2024 18:44:47

[ #21 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2752
From: Kansas

cdimauro Quote:

Matt, why are you spreading completely false information? Just to discredit OOP, which you clearly don't like?

Sarcasm aside, which my post was, I have no problem with OOP. It is being used and needs to be accounted for in hardware design. There is a real performance cost when indirect branches are used though.

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

Our measurements show that on a processor resembling current superscalar designs, the C++
programs measured spend a median of 5.2% and a maximum of 29% of their time executing dispatch
code. For version of the programs where every function was converted to a virtual function, the
median overhead rose to 13.7% and the maximum to 47%.

Inheritance usually results in more memory being used and OOP programs are more difficult to debug. OOP code may be easier to read but traditional functional calls are simpler than virtual function calls for debuggers. Even with a good source level debugger for OOP code, low level debugging is sometimes required.

cdimauro Quote:

The only relevant thing from the above link is that they switched to a functional programming language. Which we know that are very good, when used in their pure nature (e.g.: immutability of data), to be scaled -> compute in parallel (so, make use of multiple cores).

This is the ONLY relevant statement here!

However this absolutely does NOT imply that an OOP language (and OOP, in generale, as you false reported) is anti-parallel. And even anti-modular. That's a logic fallacy!

BTW, since introducing a functional language isn't great at defining data structures, then they decided to also add a classic, imperative programming language.

Which, again, does NOT imply that an OOP language isn't good at that.

So, basically they removed an OOP language for teaching, but this means absolutely NOTHING about the assumed anti-modularity and anti-parallelability (!) of OOP. That's a completely wrong conclusion that you came up (I don't know if because you believe it, or on purpose).

The only other relevant thing is that their students now have to learn TWO programming languages AND they still lack the OOP which is a FUNDAMENTAL cornerstone in computer science (very widely spread and adopted). Welcome to the non-sense...

Perhaps the original statement was more direct and clearer.

https://en.wikipedia.org/wiki/Comparison_of_programming_paradigms#Parallel_computing Quote:

Object-oriented programming is eliminated entirely from the introductory curriculum, because it is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum. A proposed new course on object-oriented design methodology will be offered at the sophomore level for those students who wish to study this topic.

The statement above caused the controversy in the comments. The claims have been smoothed over but they are still there both in the original link and the link â€œIntroductory Computer Science Education at Carnegie Mellon: A Deanâ€™s Perspectiveâ€. It is a curious claim to classify all OOP implementations this way. I mentioned that BOOPSI is in a way modular so I don't fully agree. My criticism was sarcasm. Usually things are not so black or white as Java and C++ OOP are great and BOOPSI "from C=" is total crap.

Last edited by matthey on 03-Mar-2024 at 06:50 PM.

Status: Offline

Karlos

Re: For @ppcamiga1: Why MUI?
Posted on 3-Mar-2024 19:23:02

[ #22 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@matthey

Nothing about OO in general is "anti parallel". Far from it. Java, the most rabdily puritanical of OO languages has has support for multiple threads since inception. Threads and asynchronous processing are a part of standard C++ these days too.

The only argument I could see is the array of structures versus the structure of arrays data organisation proposition. This matters for stuff like GPU processing. But again, that comes down to design choices of the developer and CUDA has used a subset of C++ for ages.

I don't know in what context OO is alleged to be anti parallel.

_________________
Doing stupid things for fun...

Status: Offline

cdimauro

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 5:45:34

[ #23 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@matthey

Quote:

matthey wrote:
cdimauro Quote:

Matt, why are you spreading completely false information? Just to discredit OOP, which you clearly don't like?

Sarcasm aside, which my post was, I have no problem with OOP.

Oops. Sorry, I wasn't able to get it. :-/
Quote:
It is being used and needs to be accounted for in hardware design. There is a real performance cost when indirect branches are used though.

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

Our measurements show that on a processor resembling current superscalar designs, the C++
programs measured spend a median of 5.2% and a maximum of 29% of their time executing dispatch
code. For version of the programs where every function was converted to a virtual function, the
median overhead rose to 13.7% and the maximum to 47%.

That's a quite old paper. Processors are much better on handling indirect branches since then, albeit it's still relevant (nothing for free).

But that happens anyway on similar situations, since OOP/VMT is "just" syntactic sugar over the wide spread function pointers tables.
Quote:
Inheritance usually results in more memory being used and OOP programs

It depends on how you use it, as you can see from the other comments. If you start subclassing without any real reason for doing it, then end up having multiple VMTs in memory.

But besides that... I don't see other memory increases.
Quote:
are more difficult to debug. OOP code may be easier to read but traditional functional calls are simpler than virtual function calls for debuggers.

Traditional function calls aren't the equivalent of virtual functions: calls using function pointers are the equivalent, and they suffer from the same issues. However, filling a table of pointers is a manual activity which is prone to human errors, whereas VMTs are build by compilers: so, no issues are possible there.

Overall, I don't find OOP applications more difficult to debug compared to other code using function pointers. Rather, the opposite.
Quote:
Even with a good source level debugger for OOP code, low level debugging is sometimes required.

That's expected: you need support at compiler AND debug level when you've new features, whatever they are).
Quote:
Quote:
cdimauro [quote]
The only relevant thing from the above link is that they switched to a functional programming language. Which we know that are very good, when used in their pure nature (e.g.: immutability of data), to be scaled -> compute in parallel (so, make use of multiple cores).

This is the ONLY relevant statement here!

However this absolutely does NOT imply that an OOP language (and OOP, in generale, as you false reported) is anti-parallel. And even anti-modular. That's a logic fallacy!

BTW, since introducing a functional language isn't great at defining data structures, then they decided to also add a classic, imperative programming language.

Which, again, does NOT imply that an OOP language isn't good at that.

So, basically they removed an OOP language for teaching, but this means absolutely NOTHING about the assumed anti-modularity and anti-parallelability (!) of OOP. That's a completely wrong conclusion that you came up (I don't know if because you believe it, or on purpose).

The only other relevant thing is that their students now have to learn TWO programming languages AND they still lack the OOP which is a FUNDAMENTAL cornerstone in computer science (very widely spread and adopted). Welcome to the non-sense...

Perhaps the original statement was more direct and clearer.

https://en.wikipedia.org/wiki/Comparison_of_programming_paradigms#Parallel_computing Quote:

Object-oriented programming is eliminated entirely from the introductory curriculum, because it is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum. A proposed new course on object-oriented design methodology will be offered at the sophomore level for those students who wish to study this topic.

OK, not it is but... it was completely removed from the original page. I assume that it was due to comments (I've read all of them yesterday evening, instead of spending my time to reply again to Gunnar which entered the boring propaganda+parrot mode).
Quote:
The statement above caused the controversy in the comments. The claims have been smoothed over but they are still there both in the original link and the link â€œIntroductory Computer Science Education at Carnegie Mellon: A Deanâ€™s Perspectiveâ€. It is a curious claim to classify all OOP implementations this way.

As I've said, I haven't found them at least reading at the original link. On some comments I've found fractions of them, so they were there for sure on the original claim.

I think that they were removed because of the comments, since those statements were NOT baked by rigorous studies.

IMO the professor was/is a (usual) fanatical of functional programming. We know very well that their are very good a parallel programming, but this does NOT justify the statements against OOP: they were really ridiculous.
Quote:
I mentioned that BOOPSI is in a way modular so I don't fully agree. My criticism was sarcasm.

Got it. Now. And I agree on the modularity of BOOPSI, at least.
Quote:
Usually things are not so black or white as Java and C++ OOP are great and BOOPSI "from C=" is total crap.

Well, here I don't agree: BOOPSI is really a total crap.

@Karlos

Quote:

Karlos wrote:
@matthey

Nothing about OO in general is "anti parallel". Far from it. Java, the most rabdily puritanical of OO languages has has support for multiple threads since inception. Threads and asynchronous processing are a part of standard C++ these days too.

The only argument I could see is the array of structures versus the structure of arrays data organisation proposition. This matters for stuff like GPU processing. But again, that comes down to design choices of the developer and CUDA has used a subset of C++ for ages.

I don't know in what context OO is alleged to be anti parallel.

+2.

Status: Offline

ppcamiga1

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 7:05:07

[ #24 ]

Super Member

Joined: 23-Aug-2015
Posts: 1017
From: Unknown

@cdimauro

no matter what you wrote.
Amiga GUI which means MUI based on BOOPSI is still one and only thing from amiga os that still has some value.
everything below amiga gui and graphics should be cut off and replaced by unix.
as long as you don't have working mui clone you have nothing.
and there are no reasons to switch to x86/arm.
no compatible amiga gui just use windows/android on x86/arm.

Status: Offline

Karlos

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 8:00:42

[ #25 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@ppcamiga1

What does MUI do that any modern GUI does not? You seem to think it has an important edge as the only thing worth saving.

@cdimauro/Matthey

Worth mentioning that you only get function tables for classes in C++ when you explicitly use polymorphism. Subclassing alone does not necessarily imply it. A key design aim for C++ was to have efficient concrete classes, so everything is a regular function unless explicitly made virtual. Thus the tables contain entries only for those functions. All the other functions behave normally and can be inlined etc. Contrast this to languages like Java where all (non-static) methods are virtual.

_________________
Doing stupid things for fun...

Status: Offline

ppcamiga1

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 15:44:30

[ #26 ]

Super Member

Joined: 23-Aug-2015
Posts: 1017
From: Unknown

@Karlos

mui is only thing worth saving.
we use it 30 years ago on Amiga
and it even with its drawbacks is still good enough

Status: Offline

Karlos

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 16:11:41

[ #27 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@ppcamiga1

But you are hell bent on replacing everything that matters more from the perspective of an "Amiga", but want to keep a third party UI widget set that has a 1980's implementation? Even MorphOS has moved towards ObjectiveC (which I despise as a language, but it does have the required features to be a good fit for this sort of work).

_________________
Doing stupid things for fun...

Status: Offline

kolla

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 16:20:29

[ #28 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3475
From: Trondheim, Norway

@ppcamiga1

MUI cannot really work on modern operating systems. Like ALL amiga libraries, muimaster.library and classes are shared in memory among all the software using them. On a modern OS, each and every program would have to have their own cooy of mui libs and classes in their memory space. MUI for a modern OS would look and behave a lot like Qt and GTK.

So the question remains, exactly what concepts from MUI do want to bring to a modern OS?

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

ppcamiga1

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 17:10:25

[ #29 ]

Super Member

Joined: 23-Aug-2015
Posts: 1017
From: Unknown

@Karlos

from developer pov
rest is too outdated
only mui has some value

Status: Offline

ppcamiga1

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 17:11:06

[ #30 ]

Super Member

Joined: 23-Aug-2015
Posts: 1017
From: Unknown

@kolla

do it as unix .so

Status: Offline

Karlos

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 17:40:04

[ #31 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@ppcamiga1

Do what as a .so? What is it about MUI you want to see ported? It would be easier to start something new, from scratch than "port MUI", so what is it from MUI you to replicate?

_________________
Doing stupid things for fun...

Status: Offline

jPV

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 18:16:29

[ #32 ]

Cult Member

Joined: 11-Apr-2005
Posts: 840
From: .fi

I love the possibility to configure MUI per application basis. Users can configure looks of different programs for different purposes and situations, and they're not restricted to what coder wants to do. Not just doing some generic config/theme for all programs.

Last edited by jPV on 04-Mar-2024 at 06:17 PM.

_________________
- The wiki based MorphOS Library - Your starting point for MorphOS
- Software made by jPV^RNO

Status: Offline

Karlos

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 19:03:25

[ #33 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@jPV

There are two schools of thought on that. I can see merit in both arguments for and against. Consistency of UX being the obvious one against.

I do like MUIs "it's up to you, mate" approach though.

_________________
Doing stupid things for fun...

Status: Offline

matthey

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 22:27:13

[ #34 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2752
From: Kansas

cdimauro Quote:

That's a quite old paper. Processors are much better on handling indirect branches since then, albeit it's still relevant (nothing for free).

But that happens anyway on similar situations, since OOP/VMT is "just" syntactic sugar over the wide spread function pointers tables.

Newer high performance processors often have larger branch target buffers (branch caches), better indirect branch prediction, larger data caches and more aggressive OoO which improve indirect branch performance but wider superscalar issue, longer data cache latency (load-to-use latency) and deeper pipelines with longer branch misprediction penalties suffer more from indirect branches. The paper predicts increased indirect branch overhead in the future.

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

On future processors, these overheads are likely to increase moderately.

The VFT execution example from Figure 5 requires 3 loads, an add and a branch giving especially poor performance on a RISC CPU core.

1: load [object-reg + #VFToffset], table-reg
load-to-use stall (waiting for table-reg result)
2: load [table-reg + #deltaOffset], delta-reg
3: load [table-reg + #selectorOffset], method-reg
load-to-use stall (waiting for delta-reg result)
4: add object-reg, delta-reg, object-reg
5: call method-reg (dependent on object-reg result)

This code is 9 cycles on a Cortex-A53 with warm caches before the branch instruction. It is 2-3 cycles for a warm CISC CPU core.

1: load [object-reg + #VFToffset], table-reg
2: add [table-reg + #deltaOffset], object-reg (dependent on table-reg result)
3: load [table-reg + #selectorOffset], method-reg (dependent on table-reg result)
4: call method-reg (dependent on object-reg result)

Most x86(-64) CPU cores can consistently execute the code in 2 cycles. Even the 68060 can execute the code before the branch in 2-3 cycles although I believe it would suffer from lack of indirect branch prediction (it will predict JMP/JSR absolute and PC relative branches only). The paper points out good results from predicting the previous indirect branch target which is very simple.

The paper not only mentions but explains RISC load latency (load-to-use latency).

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

Both forms of dependencies may carry an execution time penalty because of pipelining. Whereas the result of arithmetic instructions usually is available in the next cycle (for a latency of one cycle), the result of a load issued in cycle i is not available until cycle i+2 or i+3 (for a load latency L of 2 or 3 cycles) on most current processors even in the case of a first-level cache hit. Thus, instructions depending on the loaded value cannot begin execution until L cycles after the load. Similarly, processors impose a branch penalty of B cycles after conditional or indirect branches: when a branch executes in cycle i (so that the branch target address becomes known), it takes B cycles to refill the processor pipeline until the first instruction after the branch reaches the execute stage of the pipeline and produces a result.

The authors don't seem to realize that CISC CPU cores can avoid the load latency and load-to-use stalls which doesn't appear to be reflected in their P96-Pro (Pentium Pro) like simulation which performed poorly (partially due to high branch mis-prediction penalty as well). Low end CPU cores are going to struggle with indirect branches, especially in-order RISC cores. The answer for RISC has been expensive OoO but the less expensive SiFive U74 core design removes the load-to-use stalls like CISC cores. There is a significant hardware tax regardless for descent indirect branch support. Indirect branches are necessary sometimes besides OO code so some support is warranted with the low cost of transistors today.

cdimauro Quote:

It depends on how you use it, as you can see from the other comments. If you start subclassing without any real reason for doing it, then end up having multiple VMTs in memory.

But besides that... I don't see other memory increases.

Yet I wouldn't be surprised if an AmigaOS Reaction gadget modular file that is 10kiB would be 20kiB for the MUI equivalent and 100kiB for the C++ equivalent. Granted, 68k C++ compilers are not very good and compiles for other architectures may be closer.

Karlos Quote:

Worth mentioning that you only get function tables for classes in C++ when you explicitly use polymorphism. Subclassing alone does not necessarily imply it. A key design aim for C++ was to have efficient concrete classes, so everything is a regular function unless explicitly made virtual. Thus the tables contain entries only for those functions. All the other functions behave normally and can be inlined etc. Contrast this to languages like Java where all (non-static) methods are virtual.

Implementation matters and there are advantages and disadvantages to each implementation. I brought C++ into the conversation for comparisons because it can be one of the more efficient OO languages depending on the programmer.

kolla Quote:

MUI cannot really work on modern operating systems. Like ALL amiga libraries, muimaster.library and classes are shared in memory among all the software using them. On a modern OS, each and every program would have to have their own copy of mui libs and classes in their memory space. MUI for a modern OS would look and behave a lot like Qt and GTK.

I expect MUI could be ported to another OS with some loss of features. Shared libraries are certainly possible. Data handling may need to be different but shared data is possible too. The base GUI handling would require much effort without intuition.library. I doubt anyone would try to port it without intuition.library which practically requires exec.library and a basic AmigaOS system.

P.S. The MUI AREXX public port per object isn't really a good idea for security reasons and uses more memory anyway.

Last edited by matthey on 04-Mar-2024 at 10:29 PM.
Last edited by matthey on 04-Mar-2024 at 10:28 PM.

Status: Offline

agami

Re: For @ppcamiga1: Why MUI?
Posted on 4-Mar-2024 23:52:22

[ #35 ]

Super Member

Joined: 30-Jun-2008
Posts: 1958
From: Melbourne, Australia

@thread

With every MUI, OCS vs. AGA, or 32-bit BE 1000x performance statement he reveals that he is just one of those consumers that likes to feel superior through their purchases.

Itâ€™s not about improving platform development or architectural pragmatism. The more you try to corner him, the shorter and less explanatory are his responses.

Somewhere along the way his psyche glommed onto the equation that â€œdifferent is betterâ€. In some cases that can certainly be true, but if taken to excess one is danger of becoming a contrarian: Disliking a thing simply by the virtue of everyone else appearing to like that thing.

Deep down, he sincerely wishes that a team of Amiga devotees would bring about a true second act for the Amiga platform via an OS X like move.
In which case he wouldnâ€™t care if itâ€™s on ARM 64 or x64, as long as itâ€™s reminiscent of the Amiga UI/UX (in his words: MUI) and he gets to be â€œdifferentâ€ from the macOS, Linux, and Windows users a.k.a. peasants.

Last edited by agami on 04-Mar-2024 at 11:54 PM.

_________________
All the way, with 68k

Status: Offline

cdimauro

Re: For @ppcamiga1: Why MUI?
Posted on 5-Mar-2024 6:08:46

[ #36 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4438
From: Germany

@ppcamiga1

Quote:

ppcamiga1 wrote:
@cdimauro

no matter what you wrote.
Amiga GUI which means MUI based on BOOPSI is still one and only thing from amiga os that still has some value.

Wrong. MUI was NOT part of the o.s. neither the only way to create GUI. It was one of the most widespread, yes, but there were also other GUI toolkits (BGUI comes immediately to my mind).

Nevertheless, BOOPSI sucks a lot as OOP implementation.
Quote:
everything below amiga gui and graphics should be cut off and replaced by unix.
as long as you don't have working mui clone you have nothing.
and there are no reasons to switch to x86/arm.
no compatible amiga gui just use windows/android on x86/arm.

@Karlos

Quote:

Karlos wrote:

@cdimauro/Matthey

Worth mentioning that you only get function tables for classes in C++ when you explicitly use polymorphism. Subclassing alone does not necessarily imply it.

Correct.
Quote:
A key design aim for C++ was to have efficient concrete classes, so everything is a regular function unless explicitly made virtual. Thus the tables contain entries only for those functions. All the other functions behave normally and can be inlined etc. Contrast this to languages like Java where all (non-static) methods are virtual.

Yes, but assuming that you're properly defining static functions and methods in your class, it's all about how good the compiler is at generating the proper data structures for the classes.

Even Turbo Pascal was efficient, from what I recall (there were low-level examples, even with asm reported, showing how it was working underneath, on Borland's TP5.5 manual).

@kolla

Quote:

kolla wrote:
@ppcamiga1

MUI cannot really work on modern operating systems. Like ALL amiga libraries, muimaster.library and classes are shared in memory among all the software using them.

Not on AxRuntime...
Quote:
On a modern OS, each and every program would have to have their own cooy of mui libs and classes in their memory space.

Not exactly. The code is shared (so, no copy), as well as read-only data. Only Data & BSS sections are unique for each process using a shared/dynamic/DLL library.

@matthey

Quote:

matthey wrote:
cdimauro Quote:

That's a quite old paper. Processors are much better on handling indirect branches since then, albeit it's still relevant (nothing for free).

But that happens anyway on similar situations, since OOP/VMT is "just" syntactic sugar over the wide spread function pointers tables.

Newer high performance processors often have larger branch target buffers (branch caches), better indirect branch prediction, larger data caches and more aggressive OoO which improve indirect branch performance but wider superscalar issue, longer data cache latency (load-to-use latency) and deeper pipelines with longer branch misprediction penalties suffer more from indirect branches. The paper predicts increased indirect branch overhead in the future.

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

On future processors, these overheads are likely to increase moderately.

It depends on processors' implementation, but overall the execution speed of indirect branches improved a lot even with deeper pipelines and longer penalties.
Quote:
The VFT execution example from Figure 5 requires 3 loads, an add and a branch giving especially poor performance on a RISC CPU core.

1: load [object-reg + #VFToffset], table-reg
load-to-use stall (waiting for table-reg result)
2: load [table-reg + #deltaOffset], delta-reg
3: load [table-reg + #selectorOffset], method-reg
load-to-use stall (waiting for delta-reg result)
4: add object-reg, delta-reg, object-reg
5: call method-reg (dependent on object-reg result)

This code is 9 cycles on a Cortex-A53 with warm caches before the branch instruction. It is 2-3 cycles for a warm CISC CPU core.

1: load [object-reg + #VFToffset], table-reg
2: add [table-reg + #deltaOffset], object-reg (dependent on table-reg result)
3: load [table-reg + #selectorOffset], method-reg (dependent on table-reg result)
4: call method-reg (dependent on object-reg result)

Most x86(-64) CPU cores can consistently execute the code in 2 cycles. Even the 68060 can execute the code before the branch in 2-3 cycles although I believe it would suffer from lack of indirect branch prediction (it will predict JMP/JSR absolute and PC relative branches only). The paper points out good results from predicting the previous indirect branch target which is very simple.

The paper not only mentions but explains RISC load latency (load-to-use latency).

https://dl.acm.org/doi/10.1145/236338.236369 Quote:

Both forms of dependencies may carry an execution time penalty because of pipelining. Whereas the result of arithmetic instructions usually is available in the next cycle (for a latency of one cycle), the result of a load issued in cycle i is not available until cycle i+2 or i+3 (for a load latency L of 2 or 3 cycles) on most current processors even in the case of a first-level cache hit. Thus, instructions depending on the loaded value cannot begin execution until L cycles after the load. Similarly, processors impose a branch penalty of B cycles after conditional or indirect branches: when a branch executes in cycle i (so that the branch target address becomes known), it takes B cycles to refill the processor pipeline until the first instruction after the branch reaches the execute stage of the pipeline and produces a result.

The authors don't seem to realize that CISC CPU cores can avoid the load latency and load-to-use stalls which doesn't appear to be reflected in their P96-Pro (Pentium Pro) like simulation which performed poorly (partially due to high branch mis-prediction penalty as well). Low end CPU cores are going to struggle with indirect branches, especially in-order RISC cores. The answer for RISC has been expensive OoO but the less expensive SiFive U74 core design removes the load-to-use stalls like CISC cores. There is a significant hardware tax regardless for descent indirect branch support. Indirect branches are necessary sometimes besides OO code so some support is warranted with the low cost of transistors today.

Thanks for sharing it (I agree on the PPro observation).

However and as I've reported before, indirect branches aren't something new. They were used well before OOP, in common scenarios. You can't avoid them.
Quote:
cdimauro Quote:

It depends on how you use it, as you can see from the other comments. If you start subclassing without any real reason for doing it, then end up having multiple VMTs in memory.

But besides that... I don't see other memory increases.

Yet I wouldn't be surprised if an AmigaOS Reaction gadget modular file that is 10kiB would be 20kiB for the MUI equivalent and 100kiB for the C++ equivalent. Granted, 68k C++ compilers are not very good and compiles for other architectures may be closer.

I don't share the same vision, rather the opposite.

BOOPSI implies wasting a lot of data and code for achieving the same things that even the crappiest C++ (but it applies to any OOP language which has static compilation) compiler avoids.

1) You define properties/attributes by using tags -> not only you need an internal data structure for saving them, but you waste a lot of space by simply defining each member using tags.
And the code which defines such properties has to scan the property ids to catch the good one and then takes proper actions -> a lot of other space (code) wasted.

In C++ only the internal data structure is allocated and the constructor's code takes negligible code (unless you do more complex initialization. But this applies to a BOOPSI constructor as well).

2) You use tags for accessing properties/attributes -> same as before, because you need to scan the property ids and then take the proper actions to retrieve and give back the results.

In C++ it's all about the usual struct->attribute_name (usually one CPU instruction, even on RISCs). Or even struct.attribute_name, because you can also have static objects which require no memory allocation at all.

3) You use tags for calling methods -> similar as before, because you need to scan the method ids and then call the proper routine. So, you end-up with the usual switch/case statement which needs instructions for being implemented and kills the CPU's pipeline by all such dependent comparisons, or have to use a indirect branch anyway when the method ids are too much widespread on the 32-bit id (this applies to the above property ids as well).

And if you have multiple implementations (e.g.: method overloading) then if your current class hasn't overridden the method then you've to call the previous handler (e.g.: previous class) and repeat the same, until you finally find the method implementation.

This both kills the processor's pipeline and the size of code.

In C++ you need the VMT, which has a static size (one pointer per virtual method) and the client code is just the few instructions that you've already reported.

4) Callbacks requires the classic switch/case loop that goes on the same direction of the properties & methods, but this time it's all client-side. So, the application's code needs to take care of it.

In C++ the application code needs to provide only the pointer to the callback, which will be save a list in memory. Very very efficient both on class and client side.

Summing it all up: BOOPSI sucks. A LOT...

Status: Offline

ppcamiga1

Re: For @ppcamiga1: Why MUI?
Posted on 5-Mar-2024 6:51:40

[ #37 ]

Super Member

Joined: 23-Aug-2015
Posts: 1017
From: Unknown

@cdimauro

amiga gui which means mui is one and only thing from amiga os still usable.
rest is too outdated and should be replaced by unix.
as long as you don't have working mui clone you have nothing.
and there are no reasons to switch toi x86/arm.
so stop trolling start working on mui clone.

Status: Offline

Hammer

Re: For @ppcamiga1: Why MUI?
Posted on 5-Mar-2024 7:03:04

[ #38 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6504
From: Australia

@matthey

Quote:

Most x86(-64) CPU cores can consistently execute the code in 2 cycles. Even the 68060 can execute the code before the branch in 2-3 cycles although I believe it would suffer from lack of indirect branch prediction (it will predict JMP/JSR absolute and PC relative branches only). The paper points out good results from predicting the previous indirect branch target which is very simple.

https://dl.acm.org/doi/10.1145/236338.236369 Q

Most x86(-64) CPU cores can consistently execute the code in 2 cycles. Even the 68060 can execute the code before the branch in 2-3 cycles although I believe it would suffer from lack of indirect branch prediction (it will predict JMP/JSR absolute and PC relative branches only). The paper points out good results from predicting the previous indirect branch target which is very simple.

1996 paper? You got to be kidding!

Are you claiming Zen 4's and Raptor Lake's IPC didn't improve from Pentium Pro?

Last edited by Hammer on 05-Mar-2024 at 07:04 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

geit

Re: For @ppcamiga1: Why MUI?
Posted on 5-Mar-2024 10:37:07

[ #39 ]

Regular Member

Joined: 20-May-2006
Posts: 105
From: Germany

@cdimauro

Not sure what you want to achieve here.

Sure the method and attribute stuff in MUI is not very efficient.

The main issue I have with your arguments is that you count bytes and CPU cycles. With a modern CPU (not talking ppc, even so ppc is helping a lot speed wise) all these arguments are pointless.

A normal MUI-UI needs most CPU when the classes and instances are getting created. On very complex windows opening/resizing and therefor UI placements are requiring cpu power for a short time. A running MUI application does not require CPU at all. Interactions are (beside resize or a runtime change of the visible part of the window) mostly irrelevant.

The used memory for data storage and class code (e.g. the switch/case) is peanuts. Remember we count the memory in gigabytes these days and usually in dual digits. Opening a single website downloads more data than a classic Amiga with extension can hold. Not to mention the amount of memory and cpu that takes.

So in the end it does not matter if an application takes 10MB or 11MB. If you want to "save" memory like in the old days, there are other things which are more effective. Use shared libraries instead of linking stuff for example. As soon as Linux based stuff gets involved saving memory is no longer an option. In general it is better to rethink your algorithms and code base to save memory. Times where you try to press the last bit out of your code to make it smaller, are over.

A good modern compiler is also a way to optimize your result.

If you want to deal with non MUI stuff, that is fine, but keep in mind that your time is also a valid requirement. Choose the UI, which floats your boat.

I personally got stuck with GadTools for a long time in the past. I had my own framework build around, which made UI creation easy. BoulderDÃ¤sh has one of the most complex UIs I created using that in the past. It has 20 (!!) UI windows you can open all parallel and which all interact properly. In the End I swapped to MUI for all my stuff, because it is so much easier. Not only can I easy recycle stuff from one application to another, but also the range of features is nice and would have been impossible with the old stuff.

The best feature in MUI is that you can embed stuff. E.g. in MorphOS each preferences panel is its own MUI class. So if an application like a word processor wants to point to the printer preferences it is not required to open the preferences tool with the printers page, but it can insert the printer settings into word processor preferences. The MorphOS installer is displaying network, time and other settings. These are the ones used when you later on open the preferences panel.

And the preferences panel also seamlessly embeds external classes, like the UI for each type of blanker.

Sure the (MUI) code could have been more effective. But this is what we have now. It has been growing over a very long time. It also failed in many places. e.g. by having uncontrollable requirements resulting in NList and other replacements, which are obsolete for years and should not be used in modern applications, but all in all the result is impressive and there is no competitor that even gets close.

Last edited by geit on 05-Mar-2024 at 10:41 AM.

Status: Offline

matthey

Re: For @ppcamiga1: Why MUI?
Posted on 5-Mar-2024 21:53:55

[ #40 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2752
From: Kansas

cdimauro Quote:

It depends on processors' implementation, but overall the execution speed of indirect branches improved a lot even with deeper pipelines and longer penalties.

High performance CPU cores improved "a lot" by throwing transistors at the problem since they already threw transistors at OoO but these cores are often many times larger and many times more expensive than the more popular and practical cores. The most affordable and popular in-order RISC cores still suffer from indirect branch performance. Indirect branch performance depends heavily on large amounts of cached data. A single VFT execution requires 3 separate data cache lines and a branch target buffer (branch cache) entry just to predict the last indirect branch target. Conditional branches have a significantly higher prediction percentage than indirect branches with just 2 bits of history data per branch. Predicting more than the last indirect branch target requires not only the branch history data but also the address of each indirect branch target. The most affordable and popular CPU cores have more limited caches than high performance cores and a VFT execution with data not cached can take hundreds of cycles and increase jitter for these often embedded cores. Even on high performance CISC cores with huge caches, best case performance is only a few cycles more than a condition branch but worst case performance is closer to more affordable cores.

cdimauro Quote:

I don't share the same vision, rather the opposite.

BOOPSI implies wasting a lot of data and code for achieving the same things that even the crappiest C++ (but it applies to any OOP language which has static compilation) compiler avoids.

1) You define properties/attributes by using tags -> not only you need an internal data structure for saving them, but you waste a lot of space by simply defining each member using tags.
And the code which defines such properties has to scan the property ids to catch the good one and then takes proper actions -> a lot of other space (code) wasted.

In C++ only the internal data structure is allocated and the constructor's code takes negligible code (unless you do more complex initialization. But this applies to a BOOPSI constructor as well).

2) You use tags for accessing properties/attributes -> same as before, because you need to scan the property ids and then take the proper actions to retrieve and give back the results.

In C++ it's all about the usual struct->attribute_name (usually one CPU instruction, even on RISCs). Or even struct.attribute_name, because you can also have static objects which require no memory allocation at all.

3) You use tags for calling methods -> similar as before, because you need to scan the method ids and then call the proper routine. So, you end-up with the usual switch/case statement which needs instructions for being implemented and kills the CPU's pipeline by all such dependent comparisons, or have to use a indirect branch anyway when the method ids are too much widespread on the 32-bit id (this applies to the above property ids as well).

And if you have multiple implementations (e.g.: method overloading) then if your current class hasn't overridden the method then you've to call the previous handler (e.g.: previous class) and repeat the same, until you finally find the method implementation.

This both kills the processor's pipeline and the size of code.

In C++ you need the VMT, which has a static size (one pointer per virtual method) and the client code is just the few instructions that you've already reported.

4) Callbacks requires the classic switch/case loop that goes on the same direction of the properties & methods, but this time it's all client-side. So, the application's code needs to take care of it.

In C++ the application code needs to provide only the pointer to the callback, which will be save a list in memory. Very very efficient both on class and client side.

Summing it all up: BOOPSI sucks. A LOT...

AmigaOS tag lists trade performance for flexibility. It's not the best system but not the worst either. Multiple tags can be processed together with one function call avoiding multiple function call overhead. Custom tags can be chosen with base+offset where the offset is the offset into an embedded structure and the base can be subtracted to give the structure offset. The original AmigaOS structure handling was more efficient and provided lightweight OO like functionality by embedding structures in other structures. For example, a device is a library that can use the functions of a library without any indirect branches (inheritance?). Efficiency doesn't get any better than this but there are extensibility concerns when structures grow which is why tag lists were created.

BOOPSI performance can be criticized but the code is small, extensible, modular and likely could execute in parallel well with SMP. C++ has a performance advantage but statically linking code and data results in bloated executables which are less modular. Comparing these two is difficult because they are very different OO implementations. BOOPSI was likely designed to be small, flexible and modular while C++ was likely designed for performance while trading executable size. I believe both partially accomplished their goals. I have doubts that an OO C++ GUI implementation like Reaction or MUI would be possible on a 1MiB AmigaOS system.

Hammer Quote:

1996 paper? You got to be kidding!

Are you claiming Zen 4's and Raptor Lake's IPC didn't improve from Pentium Pro?

Most of the technical challenged in the paper still apply today and are explained well. Sure, aggressive OoO CPU cores with more caches than memory of any Amiga C= sold are going to have good best case indirect branch performance which is most of the time. I suppose everyone should pay $100 USD for a CPU instead of $1 USD? Funny enough with more affordable in-order CPU cores, the smaller 1994 68060 from before the paper was written looks to me like OO VFT calls outperform clock for clock the most popular core in the world Cortex-A53 which is larger and much newer. It doesn't even look like the 68060 predicts the previous indirect branch target either. The main difference is the Cortex-A53 3 cycle load-to-use penalty (from the L1 cache) which the 1996 paper talks about. Reducing or elimination of the load-to-use penalty and reducing the branch misprediction penalty should improve OO code execution performance according to the paper. There is an in-order RISC core which did this called the SiFive U74 core.

ARM Cortex-A53 (8 stage pipeline)
L1 data cache load-to-use penalty = 3 cycles
Branch misprediction penalty = 7-8 cycles

68060 (8 stage pipeline)
L1 data cache load-to-use penalty = 0 cycles
Branch misprediction penalty = 7-8 cycles

SiFive U74 (8 stage pipeline)
L1 data cache load-to-use penalty = 0 cycles
Branch misprediction penalty = 4-6 cycles

The use of a CISC/68060 like pipeline design for the in-order SiFive U74 core not only improves OO performance but general purpose performance too. The SiFive U74 core has a claimed 2.64 DMIPS/MHz from a CPU core that could be produced for closer to $1 USD than $100 USD while outperforming most OoO PPC cores. RISC-V instructions can only use the AG(U)/EA_Calc stage or EX/execution stage of the pipeline while CISC instructions can use both which is common. Most CISC OP mem,reg and OP reg,mem instructions can use both stages which is like being able to execute another RISC like instruction for free and it can do it in both execution pipelines at the same time where it is like being able to execute two RISC instructions for free (U74 core can only perform one mem access per cycle but more powerful 68060 core can perform two). With just 10% to 20% of these common CISC instructions, I expect 3 DMIPS/MHz from a CPU core which costs closer to $1 USD than $100 USD to produce with single core performance competitive with the best OoO PPC cores.

Last edited by matthey on 06-Mar-2024 at 08:29 AM.
Last edited by matthey on 05-Mar-2024 at 10:00 PM.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle