Poster | Thread |
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 18:19:00
| | [ #41 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @fishy_fis
of course a X86 with dualchannel RAM is lots faster, but OS4 run not on a System with DUAL Channel DDR ram.the test should only show that tlsf is the fastest allocator and it doesnt matter what mem sizes to alloc.
tlsfmem get always same speed, on my system if i use the best case mem alloc (the values i post)or the worst case values from fab.
@ChrisH >Now that we have more reasonable sounding results,
this values are best case values, the overall alloc is only about 8 kb.so there need only 2 mmu pages changes by this constant free and alloc.
maybe you change the test to more realistic so 1 meg is alloc and there are instead of 20 allocs now 100.and 2 of them are in size 300 kb and 700 kb.
but all in all can see tlsf mem is lots faster as Slab even if OS4 need not change much MMU pages.
@umisef >Using Bernd's size values, a million iterations take 15.4 seconds, 21 allocations per >iteration --- so about 750ns. >Here is the output... >1000 iterations: Elapsed time: 13393 us (0.013393 s) >2000 iterations: Elapsed time: 26583 us (0.026583 s) >10000 iterations: Elapsed time: 130578 us (0.130578 s) >50000 iterations: Elapsed time: 855696 us (0.855696 s) >100000 iterations: Elapsed time: 1506454 us (1.506454 s) >1000000 iterations: Elapsed time: 15482070 us (15.482070 s)
for the Unix with MMU Test, its also usefull to see the test with the values from Fab.
Last edited by bernd_afa on 22-Sep-2009 at 05:05 PM. Last edited by bernd_afa on 19-Sep-2009 at 06:25 PM. Last edited by bernd_afa on 19-Sep-2009 at 06:22 PM. Last edited by bernd_afa on 19-Sep-2009 at 06:21 PM.
|
|
Status: Offline |
|
|
pixie
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 18:30:05
| | [ #42 ] |
|
|
|
Elite Member |
Joined: 10-Mar-2003 Posts: 3120
From: Figueira da Foz - Portugal | | |
|
| |
Status: Offline |
|
|
fishy_fis
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 19:04:10
| | [ #43 ] |
|
|
|
Elite Member |
Joined: 29-Mar-2004 Posts: 2159
From: Australia | | |
|
| @bernd_afa
Sure, of course modern hardware will be a lot faster than peg/peg2/a1/sam/etc. I only posted the results incase anyone was interested to see them and also to compare different AROS setups (not hardware, but how its running,... in a VM for example RAM latency seems to suffer greatly(although this is no real surprise)). Id be interested to see results from other AROS setups too (linux hosted, qemu with virtualiser,etc.). Hopefully some other AROS users who run it in a different way will also post thier results. On a slightly different note however, and its not really important, just a thought, but setting ideal values seems a little redundant to me. The original tests by Fab seem to be a little more valid than your idealised version. Writing a test to perform as optimally and cleanly as possible doesnt really tell a lot. |
|
Status: Offline |
|
|
number6
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 20:42:22
| | [ #44 ] |
|
|
|
Elite Member |
Joined: 25-Mar-2005 Posts: 11587
From: In the village | | |
|
| @Fab
Quote:
OS4.1 (advanced SLAB allocator) with membench_amigaos4: iterations - result 1000 : ~5060000 µs (5 s) 2000 : ~10120000 µs (10 s) 10000 : ~50600000 µs (50 s) 50000 : ~256000000 µs (256 s) 100000 : ~514000000 µs (514 s) 1000000 : N/A... I hadn't enough time for that. :) |
For whatever it's worth:
YOUR tests, as opposed to Bernd's, for reasons I explained in prior post:
Micro GX - OS4.0 final+July update (representing the complete final package)
membench_amigaos4
1000: Elapsed time: 3779427 µs 2000: Elapsed time: 8000613 µs 10000:Elapsed time: 39974912 µs 50000:Elapsed time: 201559684 µs
membench_68k
1000:Elapsed time: 4084742 µs 2000:Elapsed time: 8640998 µs 10000:Elapsed time: 43231655 µs 50000:Elapsed time: 216280114 µs
#6
Last edited by number6 on 19-Sep-2009 at 08:48 PM.
_________________ This posting, in its entirety, represents solely the perspective of the author. *Secrecy has served us so well* |
|
Status: Offline |
|
|
Karlos
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 20:52:53
| | [ #45 ] |
|
|
|
Elite Member |
Joined: 24-Aug-2003 Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition! | | |
|
| Where I come from, resource allocation is always considered to be slow and expensive compared to most operations, therefore you never, ever do it in time critical code and no application should be frequently allocating and releasing resources if it can be avoided.
_________________ Doing stupid things for fun... |
|
Status: Offline |
|
|
paolone
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 21:01:24
| | [ #46 ] |
|
|
|
Super Member |
Joined: 24-Sep-2007 Posts: 1143
From: Unknown | | |
|
| I've run the test on my Icaros machine (Athlon64 X2 5200+ 2,6 Ghz, 1 GB DDR2-800 MHz RAM) and it is actually slower than the virtual machine I've tested before. That's odd, since also my Core2 Quad machine uses DDR2 800 Mhz modules, and AROS is running inside a VM. Anyway, I noticed that
1. running the test more times, give always worse results (speed decreases over time)
2. have another application running, like OWB, but "in idle" slowdowns the test
Anyway here are the results for the AROS real machine
1000 >>>> 31103 µs 1000000 > 31757832 µs
I've used original Fab's sources, compiled with gcc -o membench membench.c
|
|
Status: Offline |
|
|
number6
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 21:52:34
| | [ #47 ] |
|
|
|
Elite Member |
Joined: 25-Mar-2005 Posts: 11587
From: In the village | | |
|
| @Cyborg
Quote:
For the heck of it here the results with bernd_afa changes on OS4: 1000: 113044 µs (0.113044 s) 2000: 225120 µs (0.225120 s) 10000: 1155693 µs (1.155693 s) 50000: 5807400 µs (5.807400 s) 100000: 11615699 µs (11.615699 s) 1000000: 116656246 µs (116.656250 s) |
Amended results including the OS4.x version after Chris H's recent upload fixing an earlier issue with the OS4.x version running under OS4.0:
Micro GX - OS4.0 final+July update (representing the complete final package)
membench-bernd_afa_OS3
1000: Elapsed time: 97459 µs = 97 ms 2000: Elapsed time: 194081 µs = 194 ms 10000: Elapsed time: 1016048 µs = 1016 ms 50000: Elapsed time: 5074529 µs = 5074 ms 100000:Elapsed time: 10162837 µs = 10162 ms
membench-bernd_afa_OS4
1000: Elapsed time: 64124 µs = 64 ms 2000: Elapsed time: 128945 µs = 128 ms 10000: Elapsed time: 637945 µs = 637 ms 50000: Elapsed time: 3193822 µs = 3193 ms 100000:Elapsed time: 6400816 µs = 6400 ms
#6
Last edited by number6 on 20-Sep-2009 at 06:24 PM.
_________________ This posting, in its entirety, represents solely the perspective of the author. *Secrecy has served us so well* |
|
Status: Offline |
|
|
Fab
| |
Re: Interesting memory allocation benchmark Posted on 19-Sep-2009 22:36:26
| | [ #48 ] |
|
|
|
Super Member |
Joined: 17-Mar-2004 Posts: 1178
From: Unknown | | |
|
| @Karlos
Sure, especially in realtime application, it's recommended to avoid dynamic allocations as much as possible.
But with an allocator like TLSF, allocation and other operations are bound to a given value, making it deterministic, and so qualifying for realtime usage.
Now, about desktop applications, even if you try to avoid allocations in critical code (which is not often the case in complex and badly designed c++ apps :)), you must also consider the system can run for several days/weeks/months/years (ok, unlikely for an amigaos-like :)). With a linked-list structure, and if we exclude the fragmentation problem, the allocation time could get really slow in the end, as opposed to TLSF (or any other o(1) allocator).
|
|
Status: Offline |
|
|
marko
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 2:50:56
| | [ #49 ] |
|
|
|
Super Member |
Joined: 17-Dec-2007 Posts: 1816
From: Gothenburg, THE front side of Sweden ;), (via Finland), EU | | |
|
| Hmm, this is interesting... and worrying :(
Here's some more numbers...
OS4.1 Workbench (with Quick-Fix) on Sam440ep-flex 800MHz
power2: 1000: 5773 ms (5.773 s) 2000: 11628 ms (11.628 s) 10000: 59183 ms (59.183 s) 50000: -- 100000: -- 1000000: --
bernd_afa: 1000: 94 ms (0.094 s) 2000: 190 ms (0.190 s) 10000: 955 ms (0.955 s) 50000: 4804 ms (4.804 s) 100000: 9825 ms (9.825 s) 1000000: 95997 ms (95.997 s)
-- --
OS4.1 without startup-sequence on Sam440ep-flex 800MHz
power2: 1000: 5585 ms (5.585 s) 2000: 11178 ms (11.178 s) 10000: 55919 ms (55.919 s) 50000: -- 100000: -- 1000000: --
bernd_afa: 1000: 84 ms (0.084 s) 2000: 169 ms (0.169 s) 10000: 845 ms (0.845 s) 50000: 4230 ms (4.230 s) 100000: 8461 ms (8.461 s) 1000000: 84616 ms (84.616 s)
-- --
OS3.x WinUAE/AmigaForever on Vista (with tons of background processes), AMD Athlon 64 X2 Dual Core 4200+, 2.2 GHz
power2: 1000: 20 ms (0.020 s) 2000: 59 ms (0.059 s) 10000: 280 ms (0.280 s) 50000: 1419 ms (1.419 s) 100000: 2880 ms (2.880 s) 1000000: 28639 ms (28.639 s)
bernd_afa: 1000: 19 ms (0.019 s) 2000: 39 ms (0.039 s) 10000: 199 ms (0.199 s) 50000: 1039 ms (1.039 s) 100000: 2080 ms (2.080 s) 1000000: 20739 ms (20.739 s)
_________________ AmigaOS 4.1 FEu2 on Sam440ep-flex 800MHz 1GB RAM C128, A500+, A1200, A1200/40, AmigaForever 2008+09+16, 5 x86/x64 boxes Still waiting (or dreaming) for the Amiga revolution... m4rko.com/AMIGA |
|
Status: Offline |
|
|
fishy_fis
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 6:57:45
| | [ #50 ] |
|
|
|
Elite Member |
Joined: 29-Mar-2004 Posts: 2159
From: Australia | | |
|
| oops, accidental reposting of an earlier post.
Oh well, seeing as I made a post I needed to edit,....
@paolone
Those results are unusual.... a VM should be significantly slower than a native set-up, but more than that there seems to be a huge difference between our 2 results,.. a factor of 5-10x. Granted a Core2Duo@3.6ghz is probably 150-200 percent the speed of athlon64 x2 5200+, but there's still a huge discrepency. Only thing I can think of is that maybe Icaros is using resources in the background that it shouldnt ? (I dont use Icaros). Id be interested to find out what's going on here. Last edited by fishy_fis on 20-Sep-2009 at 07:10 AM. Last edited by fishy_fis on 20-Sep-2009 at 06:59 AM.
|
|
Status: Offline |
|
|
Hans
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 7:13:08
| | [ #51 ] |
|
|
|
Elite Member |
Joined: 27-Dec-2003 Posts: 5067
From: New Zealand | | |
|
| @Fab
Quote:
Fab wrote: Now, about desktop applications, even if you try to avoid allocations in critical code (which is not often the case in complex and badly designed c++ apps :)), you must also consider the system can run for several days/weeks/months/years (ok, unlikely for an amigaos-like :)). With a linked-list structure, and if we exclude the fragmentation problem, the allocation time could get really slow in the end, as opposed to TLSF (or any other o(1) allocator).
|
Considering that SLAB allocators are used on Unix systems that are kept running continually, I doubt that SLAB allocators have this slowdown of memory allocation over time issue. I've never actually tested this with Amiga OS 4.x though(my machine is switched off when I'm not using it), so it remains to be seen what happens. Amiga OS 3 probably does have this problem.
Hans
_________________ http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project. https://keasigmadelta.com/ - More of my work. |
|
Status: Offline |
|
|
umisef
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 9:24:39
| | [ #52 ] |
|
|
|
Super Member |
Joined: 19-Jun-2005 Posts: 1714
From: Melbourne, Australia | | |
|
| @umisef
From my earlier posting on the SheevaPlug, using the small allocations: Quote:
100000 iterations: Elapsed time: 1506454 us (1.506454 s) |
I have now put the code on my iPhone 3GS, and its performance is 3.3s for 100,000 iterations with small allocations, and 9.8s for 100,000 iterations for the power-of-two allocations.
So it appears the SheevaPlug isn't quite as slow as it feels It's twice the speed (in this) as the actual mobile phone CPU :) |
|
Status: Offline |
|
|
paolone
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 10:50:51
| | [ #53 ] |
|
|
|
Super Member |
Joined: 24-Sep-2007 Posts: 1143
From: Unknown | | |
|
| @fishy_fis
i use icaros also in the vm test.... |
|
Status: Offline |
|
|
itix
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 11:13:52
| | [ #54 ] |
|
|
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @umisef
Quote:
Anyway --- if the relative merits (rather than failings) of allocators are what you want to look at, this benchmark is not particularly interesting. The allocation/deallocation patterns are very regular, and very friendly. If you want to look at these things in the scenarios their complexity is meant to tackle, you'd need something like the source from here, which will happily fragment the memory map :)
Here is the result from the Sheevaplug: Quote:
kittycam@ubuntu:~$ ./amemtest2 40960 1 3000000 3000000 iterations: Elapsed time: 1215286 us (1.215286 s), 0.405095 us per kittycam@ubuntu:~$ ./amemtest2 40960 10 3000000 3000000 iterations: Elapsed time: 5795645 us (5.795645 s), 1.931882 us per kittycam@ubuntu:~$ ./amemtest2 40960 100 3000000 3000000 iterations: Elapsed time: 3527694 us (3.527694 s), 1.175898 us per kittycam@ubuntu:~$ ./amemtest2 40960 1000 3000000 3000000 iterations: Elapsed time: 4560545 us (4.560545 s), 1.520182 us per kittycam@ubuntu:~$ ./amemtest2 40960 10000 3000000 3000000 iterations: Elapsed time: 7855803 us (7.855803 s), 2.618601 us per
|
|
Here are my results from Pegasos II G4 (1GHz).
MorphOS 2.3 with TSLF allocator: Quote:
Varasto:Lähdekoodit/membench> amemtest2 40960 1 3000000 3000000 iterations: Elapsed time: 1653965 us (1.653965 s), 0.551322 us per Varasto:Lähdekoodit/membench> amemtest2 40960 10 3000000 3000000 iterations: Elapsed time: 1676783 us (1.676783 s), 0.558928 us per Varasto:Lähdekoodit/membench> amemtest2 40960 100 3000000 3000000 iterations: Elapsed time: 1985348 us (1.985348 s), 0.661783 us per Varasto:Lähdekoodit/membench> amemtest2 40960 1000 3000000 3000000 iterations: Elapsed time: 2747902 us (2.747902 s), 0.915967 us per Varasto:Lähdekoodit/membench> amemtest2 40960 10000 3000000 3000000 iterations: Elapsed time: 5915103 us (5.915103 s), 1.971701 us per
|
MorphOS 2.3 with SafeMemLists (MorphOS 1.x style memory system): Quote:
Varasto:Lähdekoodit/membench> amemtest2 40960 1 3000000 3000000 iterations: Elapsed time: 16400619 us (16.400619 s), 5.466873 us per Varasto:Lähdekoodit/membench> amemtest2 40960 10 3000000 3000000 iterations: Elapsed time: 16686613 us (16.686613 s), 5.562204 us per Varasto:Lähdekoodit/membench> amemtest2 40960 100 3000000 3000000 iterations: Elapsed time: 18620755 us (18.620755 s), 6.206918 us per Varasto:Lähdekoodit/membench> amemtest2 40960 1000 3000000 3000000 iterations: Elapsed time: 36703252 us (36.703252 s), 12.234417 us per Varasto:Lähdekoodit/membench> amemtest2 40960 10000 3000000 3000000 iterations: Elapsed time: 255323008 us (255.323008 s), 85.107669 us per
|
When interpreting results reader should pay an attention to the fact that in MorphoS malloc()/free() is mapped to AllocPooled()/FreePooled() calls while Fab's benchmark used AllocMem()/FreeMem().
_________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 11:18:42
| | [ #55 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| >Now, about desktop applications, even if you try to avoid allocations in critical code >(which is not often the case in complex and badly designed c++ apps :)),
right, and C++ programs are a real problem to develop with the memtracking tools that are suggest to verify that programs do no memtrash or buffer overflow.
please try the 8 kb memalloc and run with wipeout for MOS2.0 and post time for 10000 Iterations, so try this.
http://aminet.net/package/dev/debug/Wipeout-morphos
also OS4 users can do this the programs name is memguard
when i use the best case version which only alloc 8 kb, and let run wipeout, then 10000 iterations need 11 sec.without need 0,6 sec on my winuae system.
when i use tlsfmem then without wipeout need 0,2 sec and with wipeout 7 sec.
this slowness is the reason that C++ programs run extremele slow when use wipeout and so it cant use every time wipeout to test.this is not good for best program quality during develop.I always want run wipeout when i develop.
This large slowdown i notice on all C++ programs extrem.OWB.also libxml thats need in netsurf do lots of memallocs.show large pages need with netsurf and wipeout runnning also several minutes that are show in few seconds when run no wipeout.
Openredalert for example need with wipeout start time of over 3 minutes.
I dont understand wy wipeout do so much slowdown, i have Dual channel DDR mem and sysspeed show for a memtransfer rate on fast2fast over 800 megabytes.
and when do 10000 memallocs a 8 kb in 11 sec, there are only 10 Megabyte /sec of mem check.
I think its very important to get a faster working wipeout version.I dont understand wy its so slow, maybe it need the memlist with hashes or so to speed it up. Last edited by bernd_afa on 20-Sep-2009 at 11:28 AM. Last edited by bernd_afa on 20-Sep-2009 at 11:23 AM. Last edited by bernd_afa on 20-Sep-2009 at 11:22 AM. Last edited by bernd_afa on 20-Sep-2009 at 11:21 AM.
|
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 11:31:47
| | [ #56 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12817
From: Norway | | |
|
| |
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 11:35:28
| | [ #57 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| |
Status: Offline |
|
|
wawa
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 13:03:54
| | [ #58 ] |
|
|
|
Elite Member |
Joined: 21-Jan-2008 Posts: 6259
From: Unknown | | |
|
| |
Status: Offline |
|
|
itix
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 17:03:58
| | [ #59 ] |
|
|
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @bernd_afa
Quote:
I dont understand wy wipeout do so much slowdown, i have Dual channel DDR mem and sysspeed show for a memtransfer rate on fast2fast over 800 megabytes.
|
Each time when you allocate or deallocate memory Wipeout fills memory block with 0xDEADBEEF (or similar) pattern and checks tracked memory._________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
|
Status: Offline |
|
|
ChrisH
| |
Re: Interesting memory allocation benchmark Posted on 20-Sep-2009 18:03:07
| | [ #60 ] |
|
|
|
Elite Member |
Joined: 30-Jan-2005 Posts: 6679
From: Unknown | | |
|
| @number6 & others I have recompiled & uploaded the OS4 versions, without any SObj dependencies (although anyone could have done the same by installing the last public version of PortablE, which does not have that problem).
So if any OS4.0 users want to run those test executables, they can now. _________________ Author of the PortablE programming language. It is pitch black. You are likely to be eaten by a grue... |
|
Status: Offline |
|
|