Poster | Thread |
Fab
| |
Interesting memory allocation benchmark Posted on 18-Sep-2009 13:49:16
| | [ #1 ] |
|
|
|
Super Member |
Joined: 17-Mar-2004 Posts: 1178
From: Unknown | | |
|
| Quote:
Quote:
@Fab Quote: I think we already discussed it once. And what's sure is that absolutely nothing indicates SLAB would be more advanced
|
Possibly we have previously discussed it long ago, but I have since gained a better understanding of OS4.1's Slab implementation: Please go read-up on the "VMem" allocator in relation to Slabs.
You will see that VMem is O(1) & therefore modern Slab implementation's like OS4's is also O(1). VMem is actually very similar to TLSF, but predates it by a few years & so is not quite so clever (or memory efficient).
I do NOT intend to debate the pros & cons of Slab allocators vs TLSF *here*. Please start another thread if you wish to do that.
|
So, I was bored, and wrote a small simple test. This test isn't meant to be clever at all: it just allocates memory areas and then frees them, in a loop.
Here's the link to binaries and sources: http://fabportnawak.free.fr/sillybench/
Note that multitasking isn't switched off for the test, and of course be aware that more iterations and idle system will produce more stable results.
MorphOS2.3 (SafeMemList allocator (a bit like OS3.x, with more safety checks)) with membench_morphos:
iterations - result 1000 : ~160000 µs (0.16 s) 2000 : ~320000 µs (0.32 s) 10000 : ~1600000 µs (1.6 s) 50000 : ~8200000 µs (8.2 s) 100000 : ~16200000 µs (16.2 s) 1000000 : ~17000000 µs (161 s)
-> I think with more time, chunks and fragmentation, it would increase quite a lot, given the underlying algorithm. :)
MorphOS2.3 (TLSF allocator) with membench_morphos:
iterations - result 1000 : ~16000 µs (0.016 s) 2000 : ~32000 µs (0.032 s) 10000 : ~170000 µs (0.17 s) 50000 : ~870000 µs (0.87 s) 100000 : ~1740000 µs (1.7 s) 1000000 : ~17000000 µs (17 s)
-> No surprise, TLSF allocator is apparently 10 times faster than old MorphOS memory allocator in this simple scenario.
OS4.1 (advanced SLAB allocator) with membench_amigaos4:
iterations - result 1000 : ~5060000 µs (5 s) 2000 : ~10120000 µs (10 s) 10000 : ~50600000 µs (50 s) 50000 : ~256000000 µs (256 s) 100000 : ~514000000 µs (514 s) 1000000 : N/A... I hadn't enough time for that. :)
-> Surprising result for the advanced SLAB allocator. It's really much much slower than MorphOS (hundreds times slower!). So it seems there's a high constant allocation time. But I really don't explain it. Even the older MorphOS allocator is way faster.
Anybody have a clue why OS4 shows such slow results? I expected TLSF to be faster, but not with this magnitude. There seems to be something really wrong with the OS4 memory allocation time, so maybe the test exploits something particularly unfavorable to OS4 allocator, i don't know.
Last edited by Fab on 18-Sep-2009 at 02:18 PM. Last edited by Fab on 18-Sep-2009 at 02:17 PM. Last edited by Fab on 18-Sep-2009 at 02:17 PM. Last edited by Fab on 18-Sep-2009 at 02:04 PM. Last edited by Fab on 18-Sep-2009 at 02:03 PM.
|
|
Status: Offline |
|
|
BaldGuy
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 14:29:35
| | [ #2 ] |
|
|
|
Member |
Joined: 11-Aug-2009 Posts: 28
From: Belgium | | |
|
| @Fab
I know not that OS4 is slow like this.
That is horrible.
Hope Amiga and Hyperion will fix soon. _________________ AMIGA 500/EXT.FLOPPY AMIGA 1200/030/50MHz/FPU/SCSI AMIGA 4000/060/50MHz/SCSI/CYBERVISION AMIGA CD32 AMIGA CDTV AMIGA T-Shirt AMIGA Mousepad Commodore Underwear |
|
Status: Offline |
|
|
zerohero
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 15:12:24
| | [ #3 ] |
|
|
|
Team Member |
Joined: 4-May-2004 Posts: 2524
From: Uddevalla, Sweden | | |
|
| @Fab
It's even slower on my A1 XE with a G4 @ 800MHz, though I have set my FSB down to 100MHz... Interesting results though. _________________ Common sense - So rare it's almost like a super power |
|
Status: Offline |
|
|
kas1e
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 15:17:12
| | [ #4 ] |
|
|
|
Elite Member |
Joined: 11-Jan-2004 Posts: 3551
From: Russia | | |
|
| |
Status: Offline |
|
|
jPV
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 15:53:00
| | [ #5 ] |
|
|
|
Cult Member |
Joined: 11-Apr-2005 Posts: 830
From: .fi | | |
|
| With Pegasos1 and MorphOS2.3 (TLSF):
1000: 24248 µs (~0.024s) 2000: 62200 µs (~0.062s) 10000: 256290 µs (~0.26s) 50000: 1229004 µs (~1.2s) 100000: 2454614 µs (~2.5s) 1000000: 24336641 µs (~24s)
I didn't bother to boot to fresh system. So it's with couple of hours uptime, several irc clients, browser, ssh etc in the background :) _________________ - The wiki based MorphOS Library - Your starting point for MorphOS - Software made by jPV^RNO |
|
Status: Offline |
|
|
mike
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 15:53:29
| | [ #6 ] |
|
|
|
Regular Member |
Joined: 31-Jul-2007 Posts: 406
From: Alpha Centauri | | |
|
| 68060 tlsfmem 1000 Elapsed time: 239673 µs (0.23s) 2000 Elapsed time: 458838 µs (0.45s) 10000 Elapsed time: 2277872 µs (2.27s) 50000 Elapsed time: 11382361 µs (11.3s) 100000 Elapsed time: 23131983 µs (23.1s) 1000000 Elapsed time: 231297482 µs (221.2s)
68060 exec's finest. 1000 Elapsed time: 406642 µs (0.40) 2000 Elapsed time: 813348 µs (0.81) 10000 Elapsed time: 4038716 µs (4.0) 50000 Elapsed time: 20193021 µs (20.1) 100000 Elapsed time: 40599833 µs (40.5) 1000000 Elapsed time: 403810916 µs (403.8)
oi, recompiled with gcc -O3 -m68060 -noixemul membench.c -o membench, gcc340 came to 280013-273133 ns for 1000 allocs, gcc295 came to 237650-226703 at best Last edited by mike on 18-Sep-2009 at 04:50 PM. Last edited by mike on 18-Sep-2009 at 04:39 PM. Last edited by mike on 18-Sep-2009 at 04:11 PM. Last edited by mike on 18-Sep-2009 at 03:57 PM.
_________________ C= Amiga addict ,,, (Oo) ⎛☮ໄ ﮑὠՀ Couldn't care less what other people think, seeing that there's concrete evidence they don't. |
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 17:27:48
| | [ #7 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Fab >Anybody have a clue why OS4 shows such slow results? I expected TLSF to be >faster, but not with this magnitude.
try such numbers for the bench.If OS4 get faster.
int sizes[] = {2, 5, 11, 13, 28, 20, 44, 19, 3, 77, 33, 127, 251, 304, 111, 700, 43, 7011, 112, 1, 4000 }; /* Silly stuff, whatever :) */
thats more praxis relatet.most frequent mem alloc are always in range from 0 to 256 bytes.i do a small profiling tool that count memalloc < 256 and memalloc >256.
and use of programs show that memallocs < 256 are about 1000* more often as larger mem allocs.
larger memallocs happen not so often in reality, but when do the bench that it do so large allocs, the mmu tables must change often and rearrange.
|
|
Status: Offline |
|
|
Tomppeli
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 17:34:00
| | [ #8 ] |
|
|
|
Super Member |
Joined: 18-Jun-2004 Posts: 1652
From: Home land of Santa, sauna, sisu and salmiakki | | |
|
| I've noticed a long time ago that allocation is fast but deallocation is slow. So add reporting of elapsed time in between allocation and deallocation loops. And rerun the test. (Also for AmigaOS4 use AllocVecTags (it uses MEMF_PRIVATE flag by default also).)
Edit: I found a bug from it: Quote:
APTR ptr[sizeof(sizes)/sizes[0]]; for(j = 0; j < sizeof(sizes)/sizeof(sizes[0]); j++) |
Last edited by Tomppeli on 18-Sep-2009 at 06:14 PM. Last edited by Tomppeli on 18-Sep-2009 at 05:46 PM.
_________________ Rock lobster bit me. My Workbench has always preferences. X1000 + AmigaOS4.1 FE "Anyone can build a fast CPU. The trick is to build a fast system." -Seymour Cray |
|
Status: Offline |
|
|
Fab
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 18:00:07
| | [ #9 ] |
|
|
|
Super Member |
Joined: 17-Mar-2004 Posts: 1178
From: Unknown | | |
|
| @Tomppeli
There was indeed a copy/paste bug, but it didn't have any ill effect anyway, given the value of size[0].
But i changed it a bit. Not that it will change anything to these results, though. :)
|
|
Status: Offline |
|
|
itix
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 18:17:11
| | [ #10 ] |
|
|
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| I expected old memlist based memory allocater would have been slower than SLAB allocator in OS4. Even 68k Amiga is faster... _________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
|
Status: Offline |
|
|
Cheese
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 21:58:54
| | [ #11 ] |
|
|
|
Regular Member |
Joined: 23-Oct-2006 Posts: 315
From: Unknown | | |
|
| Seems SLAB rhymes with ....
Last edited by Cheese on 18-Sep-2009 at 10:18 PM.
_________________ x86/MorphOS 4.0
"Delving into the past can be a dangerous exercise." -hyperionmp
"I've been a supporter of "REACTION" GUI because is an Amiga OS thing." -Snuffy
"I personally prefer a vision of do'ers and makers rather than |
|
Status: Offline |
|
|
ssolie
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 22:46:39
| | [ #12 ] |
|
|
|
Elite Member |
Joined: 10-Mar-2003 Posts: 2755
From: Alberta, Canada | | |
|
| @Fab Quote:
Anybody have a clue why OS4 shows such slow results? |
I would suggest you email Thomas Frieden directly and discuss it. Perhaps you can help find a root cause and a solution to fix it if there is indeed a problem._________________ ExecSG Team Lead |
|
Status: Offline |
|
|
pixie
| |
Re: Interesting memory allocation benchmark Posted on 18-Sep-2009 22:56:54
| | [ #13 ] |
|
|
|
Elite Member |
Joined: 10-Mar-2003 Posts: 3359
From: Figueira da Foz - Portugal | | |
|
| @ssolie
Quote:
Perhaps you can help find a root cause and a solution to fix it if there is indeed a problem. |
And what exactly lead you into thinking there is a problem? _________________ Indigo 3D Lounge, my second home. The Illusion of Choice | Am*ga |
|
Status: Offline |
|
|
Samwel
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 7:44:51
| | [ #14 ] |
|
|
|
Elite Member |
Joined: 7-Apr-2004 Posts: 3404
From: Sweden | | |
|
| @pixie
Eh.. Maybe because the speed result is waaay slower than it should be?
_________________ /Harry
[SOLD] µA1-C - 750GX 800MHz - 512MB - Antec Aria case
Avatar by HNL_DK! |
|
Status: Offline |
|
|
corto
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 8:10:06
| | [ #15 ] |
|
|
|
Regular Member |
Joined: 24-Apr-2004 Posts: 342
From: Grenoble (France) | | |
|
| @Fab
Are we sure that memory functions do the same thing ? I mean, at work we had a similar case between two Linux and at the end one was using 'lazy allocation' : the alloc function was returned with success but mapping part of the allocation was done at the first memory access in the page.
It would be interesting to split the benchmark to know the elapsed time associated with each function.
With your raw results, that's true that something is wrong ...
As I work on tests and benchmarks everyday, I use to have (at least) 2 conclusions : - be careful with benchmarks results and early conclusions - they are both useful to improve software
Thanks Fab, you certainly pointed a problem. |
|
Status: Offline |
|
|
itix
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 8:26:14
| | [ #16 ] |
|
|
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @Samwel
Quote:
Eh.. Maybe because the speed result is waaay slower than it should be?
|
It should be easy to find out by running the test on OS 4.0. _________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
|
Status: Offline |
|
|
ChrisH
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 9:03:40
| | [ #17 ] |
|
|
|
Elite Member |
Joined: 30-Jan-2005 Posts: 6679
From: Unknown | | |
|
| @Fab Interesting albiet worrying results. I'll hopefully have time to look at closer later, but it might explain why E programs still perform better on OS4 with a custom super-fast allocator (as provided by AmigaE & PortablE) than when directly using the OS.
It's a pity that your benchmark does not report time per allocation ( = total time / number of allocations), that would make comparisons easier, and microseconds a more sensible unit. I changed it to milliseconds for sanity, but currently left it at total elapsed time for easy comparison with your results.
BEWARE that the Debug Kernel added 50% on to my reported times. Last edited by ChrisH on 19-Sep-2009 at 09:05 AM.
_________________ Author of the PortablE programming language. It is pitch black. You are likely to be eaten by a grue... |
|
Status: Offline |
|
|
Cyborg
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 9:26:34
| | [ #18 ] |
|
|
|
Regular Member |
Joined: 26-Nov-2003 Posts: 424
From: Germany | | |
|
| @Fab
bernd_afa is right. Your power-of-2 sizes are a) not praxisrelevant and b).. well.. no algorithm performs equally good or bad in every situation. I'm sure someone could also find a situation where the results would be reversed.. Anyway, if your do what bernd_afa suggested, OS4 is numerous times faster. Still not faster than MOS, but there are enough pitfalls a "benchmark" can fall into to generate questionable results.
For the heck of it here the results with bernd_afa changes on OS4:
1000: 113044 µs (0.113044 s) 2000: 225120 µs (0.225120 s) 10000: 1155693 µs (1.155693 s) 50000: 5807400 µs (5.807400 s) 100000: 11615699 µs (11.615699 s) 1000000: 116656246 µs (116.656250 s)
Quite a difference, huh? And that only because more realistic allocation sizes were used than the original silly ;) ones.
I only tested an 68k build on MOS, because I don't have the SDK (where could I get that from?). As said, it still was faster (the JIT doesn't really have any great workload with that little code) but a lot slower than with the original silly sizes.
Anyway.. this just to show you that there is absolutely no big fat problem in the memory allocation algorithms in OS4 and that the original "benchmark" of this thread doesn't mean anything. (And even if MOS is faster... well.. so be it ;) )
_________________ Regards, Cyborg. AmigaOS4 development team member
"In the beginning was CAOS.." -- Andy Finkel, 1988 (ViewPort article, Oct. 1993) |
|
Status: Offline |
|
|
ChrisH
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 10:54:28
| | [ #19 ] |
|
|
|
Elite Member |
Joined: 30-Jan-2005 Posts: 6679
From: Unknown | | |
|
| @Fab I hate coding in C, so I rewrote yours in E (actually PortablE), and compiled it for various OSes: http://cshandley.co.uk/temp/membench/
As an aside: The E source code is about half the size of the C source, has no OS-specific work-arounds, reports more meaningful information, and looks a hell of a lot nice to boot :) . I also compiled an AROS version for the hell of it (since it is about zero extra effort).
EDIT: I have also uploaded a Bernd_AFA version of the test. Is a LOT faster as reported elsewhere! Last edited by ChrisH on 19-Sep-2009 at 11:19 AM. Last edited by ChrisH on 19-Sep-2009 at 11:06 AM. Last edited by ChrisH on 19-Sep-2009 at 10:56 AM.
_________________ Author of the PortablE programming language. It is pitch black. You are likely to be eaten by a grue... |
|
Status: Offline |
|
|
ChrisH
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 11:16:52
| | [ #20 ] |
|
|
|
Elite Member |
Joined: 30-Jan-2005 Posts: 6679
From: Unknown | | |
|
| I have now uploaded a TEST script, which makes it incredibly easy. Here are my results using the NON-debug kernel on my 667MHz Sam440ep: Quote:
execute membench-TEST membench-bernd_afa_OS4
1000 iterations: Elapsed time: 114872 µs = 114 ms Average time: 5 µs (per allocation + deallocation)
2000 iterations: Elapsed time: 228924 µs = 228 ms Average time: 5 µs (per allocation + deallocation)
10000 iterations: Elapsed time: 1133979 µs = 1133 ms Average time: 5 µs (per allocation + deallocation)
50000 iterations: Elapsed time: 5610080 µs = 5610 ms Average time: 5 µs (per allocation + deallocation)
100000 iterations: Elapsed time: 11195496 µs = 11195 ms Average time: 5 µs (per allocation + deallocation)
1000000 iterations: Elapsed time: 111841734 µs = 111841 ms Average time: 5 µs (per allocation + deallocation) |
FWIW, I got 9 us when using the Debug kernel.Last edited by ChrisH on 19-Sep-2009 at 11:17 AM.
_________________ Author of the PortablE programming language. It is pitch black. You are likely to be eaten by a grue... |
|
Status: Offline |
|
|