Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

139 guest(s) on-line.

1 member(s) on-line.

amigang

You are an anonymous user.
Register Now!

amigang: 1 min ago

OlafS25: 24 mins ago

clint: 26 mins ago

zipper: 1 hr 23 mins ago

ppcamiga1: 1 hr 34 mins ago

VooDoo: 1 hr 53 mins ago

marcofreeman: 2 hrs 11 mins ago

pixie: 2 hrs 17 mins ago

kolla: 2 hrs 33 mins ago

BigD: 3 hrs 2 mins ago

Forum Index

Amiga OS4.x \ Workbench 4.x

Interesting memory allocation benchmark

Poster

Thread

itix

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 12:39:02

[ #21 ]

Elite Member

Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@Cyborg

I compiled exe with bernd_afa sizes and in MorphOS those allocations are little faster than with Fab's original sizes:

Quote:

int sizes[] = {2, 5, 11, 13, 28, 20, 44, 19, 3, 77, 33, 127, 251,
304, 111, 700, 43, 7011, 112, 1, 4000 }; /* Silly stuff, whatever :) */

1000: 14917 µs (~0.015s)
2000: 30953 µs (~0.031s)
10000: 151522 µs (~0.15s)
50000: 809008 µs (~0.81s)
100000: 1502720 µs (~1.5s)
1000000: 15111900 µs (~15s)

Quote:

int sizes[] = {2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152 }; /* Silly stuff, whatever :) */

1000: 16472 µs (~0.016s)
2000: 33878 µs (~0.034s)
10000: 168089 µs (~0.17s)
50000: 838645 µs (~0.84s)
100000: 1702309 µs (~1.7s)
1000000: 16681583 µs (~17s)

Tests were made with Pegasos 2 G4/1GHz, MorphOS 2.3 and TLSF. And I was being lazy and ran tests only once.

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

Status: Offline

Cyborg

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 12:46:50

[ #22 ]

Regular Member

Joined: 26-Nov-2003
Posts: 424
From: Germany

@itix

As said, I don't have the MOS SDK, so I had to try with an 68k version, which of course has a bit of overhead.

Now, where is the MOS SDK to download? I looked for it, but anything I found were dead sites or links..

_________________
Regards, Cyborg.
AmigaOS4 development team member

"In the beginning was CAOS.."
-- Andy Finkel, 1988 (ViewPort article, Oct. 1993)

Status: Offline

itix

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 12:49:48

[ #23 ]

Elite Member

Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@Cyborg

It seems it is unavailable at the moment. For the time being it is probably better install VBCC until SDK is online again

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

Status: Offline

ChrisH

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 13:12:23

[ #24 ]

Elite Member

Joined: 30-Jan-2005
Posts: 6679
From: Unknown

Quote:
I hate coding in C, so I rewrote yours in E (actually PortablE), and compiled it for various OSes:
http://cshandley.co.uk/temp/membench/

The first MOS executable did not work, because I compiled it with an experimental version of PortablE. I have now uploaded new MOS executables which should work...

Last edited by ChrisH on 19-Sep-2009 at 01:13 PM.

_________________
Author of the PortablE programming language.
It is pitch black. You are likely to be eaten by a grue...

Status: Offline

umisef

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 13:19:13

[ #25 ]

Super Member

Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Cyborg

Quote:
Quite a difference, huh?

It's still about 7 times slower than doing the same allocation/deallocation patterns under Ubuntu on the SheevaPlug that drives a couple of USB webcams in my home.

7 times slower, for small allocations, than a full Unix running on a mobile phone CPU. That's disgraceful!

Quote:
Anyway.. this just to show you that there is absolutely no big fat problem in the memory allocation algorithms

There certainly is *something* amiss. The original OS4 run on what I assume is a Peg2 is 9 times slower than my SheevaPlug even when I force linux to do an mmap/munmap system call pair for each allocation (by setting MALLOC_MMAP_THRESHOLD_=1).

This is a full Unix, with full memory protection, being forced to manipulate the process' MMU table for each allocation, running on a mobile phone CPU, beating out by almost an order of magnitude a single-address space OS that should not need to manipulate MMU tables at all for this benchmark, running on a CPU that has several times more raw power.

Status: Offline

Fab

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 13:22:02

[ #26 ]

Super Member

Joined: 17-Mar-2004
Posts: 1178
From: Unknown

@Cyborg

Well, like i stated in the comment, those sizes were not meant to be particularly relevant.
That said, in practice, you can also get such large allocations, and more often than you think (big apps like OWB, and other C++ monsters, emulators, whatever). Of course, they wouldn't happen at such a rate, but still, don't underestimate it.

I can't test on OS4 right now, but given the modified version results, it suggests OS4 allocator performs relatively bad with big chunks, apparently. It would need to be benchmarked separately to see how bad it is. :)

But with this more favorable benchmark, OS4 SLAB allocator is a bit faster than the quite slow MorphOS older allocator, but about 7 times slower than MorphOS TLSF.

Last edited by Fab on 19-Sep-2009 at 01:37 PM.

Status: Offline

ChrisH

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 13:36:07

[ #27 ]

Elite Member

Joined: 30-Jan-2005
Posts: 6679
From: Unknown

@umisef Quote:
It's still about 7 times slower than doing the same allocation/deallocation patterns under Ubuntu

While I disagree with comparing apples to oranges, since different CPUs make such comparisons almost meaningless, I would like to know:

* How long does a single allocation + deallocation take on your Mobile Ubuntu system?
* What is it's CPU speed? (in MHz)
* What is it's memory bus speed?

Then we might be able to have a rational/intelligent discussion! (Not necessarily any more meaningful, be we can try...)

EDIT: It appears that MOS is 7 times faster than OS4 anyway, so presumably that makes MOS the same speed as your Ubuntu. Would still like answers to my questions though.

Last edited by ChrisH on 19-Sep-2009 at 01:50 PM.
Last edited by ChrisH on 19-Sep-2009 at 01:38 PM.
Last edited by ChrisH on 19-Sep-2009 at 01:37 PM.

_________________
Author of the PortablE programming language.
It is pitch black. You are likely to be eaten by a grue...

Status: Offline

ChrisH

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 13:48:53

[ #28 ]

Elite Member

Joined: 30-Jan-2005
Posts: 6679
From: Unknown

@Fab Quote:
with this more favorable benchmark, OS4 SLAB allocator is a bit faster than the quite slow MorphOS older allocator, but about 7 times slower than MorphOS TLSF.

Now that we have more reasonable sounding results, it does seem that OS4's allocator is still a lot slower. This may well explain why PortablE still produces faster executables when I have it's custom super-fast allocator enabled (which is the default).

Hopefully someone at Hyperion can look into this, because the Slab allocator should be *faster* than TLSF. (They are both O(1), but the Slab allocator should have a lower constant overhead than TLSF.) I wonder if OS4.1's virtual memory system is to blame? Hopefuly something more mundane than that (such as a bug - Slab allocators are rather complex).

edit: Tomppeli reports that deallocation is a LOT slower than allocation on OS4.1. I wonder why - perhaps that indicates where the bug is?

BTW, I'm not sure why OS4 has such poor performance on very large allocations. Presumably they bypass the Slab allocator, and go straight to a lower-level allocator under that. VMem is not so memory efficient as TLSF, so they may go straight to some OS3-like linked-list system. What Hyperion SHOULD do is replace VMem with TLSF - should not be hard, since I implemented TLSF in one day from scratch (took a bit longer to debug fully though!), then even large memory allocations can be super-fast.

Last edited by ChrisH on 19-Sep-2009 at 01:57 PM.

_________________
Author of the PortablE programming language.
It is pitch black. You are likely to be eaten by a grue...

Status: Offline

umisef

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 14:29:59

[ #29 ]

Super Member

Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@ChrisH

Quote:
How long does a single allocation + deallocation take on your Mobile Ubuntu system?

Using Bernd's size values, a million iterations take 15.4 seconds, 21 allocations per iteration --- so about 750ns.
Here is the output...
1000 iterations: Elapsed time: 13393 us (0.013393 s)
2000 iterations: Elapsed time: 26583 us (0.026583 s)
10000 iterations: Elapsed time: 130578 us (0.130578 s)
50000 iterations: Elapsed time: 855696 us (0.855696 s)
100000 iterations: Elapsed time: 1506454 us (1.506454 s)
1000000 iterations: Elapsed time: 15482070 us (15.482070 s)

(Oh, and the SheevaPlug itself is not mobile; It just uses a mobile phone CPU).

Quote:
What is it's CPU speed? (in MHz)

It's a 1.2GHz "Kirkwood" ARM CPU (or rather, SoC).

Of course, GHz is not a measure of performance. In actual use, this thing is SLOW. Many things which I have grown to think of as instantaneous, or at least close enough to not matter, actually take time. Creating ssh keys takes quite some time; Starting up vi is noticable, and unpacking a tar.gz file is just tedious. In more objective terms, it does 5.24 million OGR-NG nodes/s, and 1.3 million RC5-72 keys.

Quote:
What is it's memory bus speed?

Who knows... I think the CPU supports a 400MHz, 16 bit wide DDR interface tops, but what frequency/width is actually used in the plug, I don't know.

Anyway --- if the relative merits (rather than failings) of allocators are what you want to look at, this benchmark is not particularly interesting. The allocation/deallocation patterns are very regular, and very friendly.
If you want to look at these things in the scenarios their complexity is meant to tackle, you'd need something like the source from here, which will happily fragment the memory map :)

Here is the result from the Sheevaplug:
Quote:

kittycam@ubuntu:~$ ./amemtest2 40960 1 3000000
3000000 iterations: Elapsed time: 1215286 us (1.215286 s), 0.405095 us per
kittycam@ubuntu:~$ ./amemtest2 40960 10 3000000
3000000 iterations: Elapsed time: 5795645 us (5.795645 s), 1.931882 us per
kittycam@ubuntu:~$ ./amemtest2 40960 100 3000000
3000000 iterations: Elapsed time: 3527694 us (3.527694 s), 1.175898 us per
kittycam@ubuntu:~$ ./amemtest2 40960 1000 3000000
3000000 iterations: Elapsed time: 4560545 us (4.560545 s), 1.520182 us per
kittycam@ubuntu:~$ ./amemtest2 40960 10000 3000000
3000000 iterations: Elapsed time: 7855803 us (7.855803 s), 2.618601 us per

Status: Offline

Fab

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 14:33:00

[ #30 ]

Super Member

Joined: 17-Mar-2004
Posts: 1178
From: Unknown

@umisef

Thanks for pointing to a more relevant benchmark. Mine was a simple test, but it clearly exposed an anormal slowness, which is why i left it as-is.

Status: Offline

AlexC

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 15:29:27

[ #31 ]

Super Member

Joined: 22-Jan-2004
Posts: 1300
From: City of Lost Angels, California.

@ChrisH

Quote:
I wonder if OS4.1's virtual memory system is to blame?

I think we can rule that possibility out as I got the same results using a version of the kernel without pager support.

And with the older kernel from 4.0/update4, the one without slabs, I got slightly better results but only by 4%.

_________________
AlexC's free OS4 software collection

AmigaOne XE/X1000/X5000/UAE-PPC OS4 laptop/X-10 Home Automation

Status: Offline

number6

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 15:38:34

[ #32 ]

Elite Member

Joined: 25-Mar-2005
Posts: 11588
From: In the village

@AlexC

Quote:
And with the older kernel from 4.0/update4, the one without slabs, I got slightly better results but only by 4%.

Running which test?
I ran ChrisH's tests and both OS4 versions failed under OS4 final/July.
The OS3 versions ran fine.

#6

_________________
This posting, in its entirety, represents solely the perspective of the author.
*Secrecy has served us so well*

Status: Offline

paolone

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 15:51:00

[ #33 ]

Super Member

Joined: 24-Sep-2007
Posts: 1143
From: Unknown

@Fab

Quote:
iterations - result 1000 : ~160000 µs (0.16 s) 2000 : ~320000 µs (0.32 s) 10000 : ~1600000 µs (1.6 s) 50000 : ~8200000 µs (8.2 s) 100000 : ~16200000 µs (16.2 s) 1000000 : ~17000000 µs (161 s)

Tried on AROS (VMware on a Q6600, Windows 7)

1000: 48645 µs (0.05 s)
2000: 60563 (0.06 s)
10000: 310146 (0.31 s)
50000: 1497832 (1.49 s)
100000: 2776329 (2.77 s)
1000000: 27650328 (27.6 s)

Later I will try on a real machine...

Status: Offline

Fab

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 15:55:38

[ #34 ]

Super Member

Joined: 17-Mar-2004
Posts: 1178
From: Unknown

@paolone

You could have quoted TLSF one instead. :)

Status: Offline

AlexC

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 15:58:47

[ #35 ]

Super Member

Joined: 22-Jan-2004
Posts: 1300
From: City of Lost Angels, California.

@number6

That was with Fab's original command linked with clib2.
Haven't tried again on the old kernels with the new ones from Chris but if they don't run it may be that they're linked with newlib or something else that's not in the upd4 kickstart/workbench.

_________________
AlexC's free OS4 software collection

AmigaOne XE/X1000/X5000/UAE-PPC OS4 laptop/X-10 Home Automation

Status: Offline

number6

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 16:08:46

[ #36 ]

Elite Member

Joined: 25-Mar-2005
Posts: 11588
From: In the village

@AlexC

Quote:
Haven't tried again on the old kernels with the new ones from Chris but if they don't run it may be that they're linked with newlib or something else that's not in the upd4 kickstart/workbench.

Possibly.
Error is elflibrary "required object is missing" if that's any help.

Fab's originals worked ok here as well.

#6

Last edited by number6 on 19-Sep-2009 at 04:09 PM.

_________________
This posting, in its entirety, represents solely the perspective of the author.
*Secrecy has served us so well*

Status: Offline

ChrisH

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 16:15:28

[ #37 ]

Elite Member

Joined: 30-Jan-2005
Posts: 6679
From: Unknown

@number6
It is probably due to a missing or outdated SObj: file.

I recently started reducing OS4 executable sizes by relying on OS4's standard SObj's, because otherwise C++ executables are so flippin large on OS4 :( . See here for more info:
http://utilitybase.com/forum/index.php?action=vthread&forum=2&topic=1805

OS4's developer docs say that a missing/outdated SObj file will be reported to the user by a window. Seems that may not be true....

Last edited by ChrisH on 19-Sep-2009 at 04:19 PM.
Last edited by ChrisH on 19-Sep-2009 at 04:19 PM.
Last edited by ChrisH on 19-Sep-2009 at 04:18 PM.
Last edited by ChrisH on 19-Sep-2009 at 04:16 PM.

_________________
Author of the PortablE programming language.
It is pitch black. You are likely to be eaten by a grue...

Status: Offline

number6

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 16:26:34

[ #38 ]

Elite Member

Joined: 25-Mar-2005
Posts: 11588
From: In the village

@ChrisH

Quote:
OS4's developer docs say that a missing/outdated SObj file will be reported to the user by a window. Seems that may not be true....

ok. If it's any help, I compared my OS4.0 final/July Sobjs with the originals on the July install disk. Identical. So...at least this confirms I didn't install anything after the fact meant for OS4.1 only.

If there are any Sobjs issued after the July update that I should install that -are- ok to use under 4.0, or might help in testing, please let me know.

#6

Last edited by number6 on 19-Sep-2009 at 04:39 PM.
Last edited by number6 on 19-Sep-2009 at 04:27 PM.

_________________
This posting, in its entirety, represents solely the perspective of the author.
*Secrecy has served us so well*

Status: Offline

AlexC

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 17:03:23

[ #39 ]

Super Member

Joined: 22-Jan-2004
Posts: 1300
From: City of Lost Angels, California.

@number6

I don't think it's wortht the trouble.
To be able to use sobj with the old update4 wb (51.x) you'd have to replace so many other components that in the end you'd end up with a 52.x setup.

It would be easier to just recompile the original from fab using the list of sizes from bernd if you really want to get the faster figures while using the old memory allocator. Even then I bet the difference would be unsignificant compared to the newer kernels.

_________________
AlexC's free OS4 software collection

AmigaOne XE/X1000/X5000/UAE-PPC OS4 laptop/X-10 Home Automation

Status: Offline

fishy_fis

Re: Interesting memory allocation benchmark
Posted on 19-Sep-2009 17:49:34

[ #40 ]

Elite Member

Joined: 29-Mar-2004
Posts: 2159
From: Australia

@anyone who's interested

Ran the test and got the following results on my AROS native machine (core2duo e5200 @ 3.6ghz)

1000:
Elapsed time: 6077 us = 6ms
Average time: 0 us (per allocation + deallocation)
2000:
Elapsed time: 12104 us = 12ms
Average time: 0 us (per allocation + deallocation)
5000:
Elapsed time: 29287 us = 29ms
Average time: 0 us (per allocation + deallocation)
10000:
Elapsed time: 60314 us = 60ms
Average time: 0 us (per allocation + deallocation)
50000:
Elapsed time: 295660us = 295ms
Average time: 0 us (per allocation + deallocation)
100000:
Elapsed time: 578249us = 578ms
Average time: 0 us (per allocation + deallocation)
1000000:
Elapsed time: 5888447us = 5888ms
Average time: 0 us (per allocation + deallocation)

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle