Poster | Thread |
umisef
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 16:43:03
| | [ #81 ] |
|
|
|
Super Member |
Joined: 19-Jun-2005 Posts: 1714
From: Melbourne, Australia | | |
|
| @mike
Quote:
What changes did you make to the mac compile?
|
Same as for the linux compiles I mentioned earlier --- use "malloc()" and "free()" for allocation/deallocation, and gettimeofday() for timing.
In fact, I simply took the linux-ified source and compiled it without any modification; I only remembered the G4 was running MacOS (rather than linux) when I tried looking at the details of the CPU, and the (linux-specific) /proc/cpuinfo wasn't there.
Quote:
Would be interesting to run this on linux 68k as well.
|
If you have an installation running, knock yourself out. The linux-adapted source is here. Also, it would be nice to compare OS4/MorphOS/linux on the same Peg2 :)
|
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 17:33:36
| | [ #82 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Seiya
>1000 = 280001 us >2000 = 599999 us >10000 = 2840000 us
Your values are horrible bad.
I boot amikit and get not so good values, but they are lots faster as your values. The reason wy there is slowdown, is the patch memtrailer 96 that is start in startup-sequence and add on every mem alloc a 96 byte bound.
when you remove it, or change the 96 to 8 then the bench get faster.
But all in all, the speed on AOS memsystem depend lot on luck. i do the test after boot start several programs, neasure , close it measure , and get diffrent values from 2 sec upto 9.3 sec. AMD64 3000+ (1,8 GHZ clock) system with winuae is lots faster as OS4 or MOS Systems of course.Can see when compare values with MOS1.4 mem system.need 16 sec
the fastest value i get after longest use
Neuer Shell-Prozeß 6 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 3739999 µs 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 9340000 µs 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 3779996 µs 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 8499994 µs 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 4739998 µs 6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000 Elapsed time: 2000001 µs 6.AmiKit:>
But MOS tlsfmem seem better as 68k tlsfmem.when i use that i get not so much speedup and test is execute always in 1.9-2.1 sec.
but ok, winuae does not like code that execute only 2 instructions and then branch. Here can enhance the jit so that such few instruction loops do not check if a chipset event occur, but your PC must be lots faster Last edited by bernd_afa on 22-Sep-2009 at 05:37 PM. Last edited by bernd_afa on 22-Sep-2009 at 05:36 PM. Last edited by bernd_afa on 22-Sep-2009 at 05:35 PM.
|
|
Status: Offline |
|
|
Interesting
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 17:35:23
| | [ #83 ] |
|
|
|
Super Member |
Joined: 29-Mar-2004 Posts: 1812
From: a place & time long long ago, when things mattered. | | |
|
| @umisef
Quote:
1.25GHz G4, 512k L2, 167MHz FSB, MacOS X Leopard |
thx, now for the review.
_________________ "The system no longer works " -- Young Anakin Skywalker |
|
Status: Offline |
|
|
itix
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 17:44:45
| | [ #84 ] |
|
|
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @bernd_afa
Quote:
wipeout check not every memallac/free memory.there need wipeout check called or a period add so it check all.
|
It checks (unless disabled from commandline options) if the allocation list is consistent before performing allocation or deallocation. It does not check for the pre/postwall damage.
Quote:
Have you test on your Peg what the memtest values are when you use wipeout ?
|
It gets very slow. Surprisingly slow.
Quote:
TLSF+Wipeout: Varasto:Lähdekoodit/membench> membench 1000 Elapsed time: 21960671 µs Varasto:Lähdekoodit/membench> membench 2000 Elapsed time: 46136320 µs Varasto:Lähdekoodit/membench> membench 1000 Elapsed time: 23054232 µs Varasto:Lähdekoodit/membench> membench 1000 Elapsed time: 23104360 µs Varasto:Lähdekoodit/membench> membench 1000 Elapsed time: 23113834 µs
|
_________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
|
Status: Offline |
|
|
afxgroup
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 17:53:05
| | [ #85 ] |
|
|
|
Super Member |
Joined: 8-Mar-2004 Posts: 1968
From: Taranto, Italy | | |
|
| @all
just for fun i've tested it on OS4.1 on my slow A1 G4-933 mhz... using malloc()/free()
Quote:
1000 iterations: Elapsed time: 16162 us (0.016162 s) 2000 iterations: Elapsed time: 32294 us (0.032294 s) 10000 iterations: Elapsed time: 161292 us (0.161292 s) 50000 iterations: Elapsed time: 809667 us (0.809667 s) 100000 iterations: Elapsed time: 1619899 us (1.619899 s) 1000000 iterations: Elapsed time: 16170085 us (16.170085 s)
|
......and so? but anyone has tried this yet on his A1?!?!
_________________ http://www.amigasoft.net |
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 22-Sep-2009 18:09:53
| | [ #86 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @itix
Your values are over 20* slower as my speed result, i think you use the test with the large mem from fab.thats possible that it take long, because the test use fill much mem and get cache misses.but in real world luckily most frequent mem allocs are of small size.
I look in wipeout docu, and see , that consistencycheck is off by default.I verify it and test speed when set on, then 1000 iterations need 120 sec and also my fast winuae crawl on all operations.
CONSISTENCECHECK These options control whether Wipeout will run a NOCONSISTENCECHECK consistency check on all its memory tracking data structures before performing memory deallocations and tracked allocations. This test may slow down the operation of the Amiga, especially on already very slow machines. Still, it is required to assure proper operation of the program; without consistent data structures, Wipeout will crash or fail to perform correctly. Default is NOCONSISTENCECHECK.
>@all
>just for fun i've tested it on OS4.1 on my slow A1 G4-933 mhz... using malloc()/free()
>Quote:
>1000 iterations: Elapsed time: 16162 us (0.016162 s) >2000 iterations: Elapsed time: 32294 us (0.032294 s) >10000 iterations: Elapsed time: 161292 us (0.161292 s) >50000 iterations: Elapsed time: 809667 us (0.809667 s) >100000 iterations: Elapsed time: 1619899 us (1.619899 s) >1000000 iterations: Elapsed time: 16170085 us (16.170085 s)
>......and so? >but anyone has tried this yet on his A1?!?!
maybe the malloc/ free implementation use poolmem or other allocator. when i compile this test for ixemul poolmem or libnix and use malloc, then it run 100000 in 0.6 sec.
but thats of course not realistic, because there is a fresh pool that contain only 20 mem entries.
when i use ixemul and buddy allocator test take 10 sec, is lots slower, but netsurf run lots faster with buddy allocator Last edited by bernd_afa on 22-Sep-2009 at 06:29 PM. Last edited by bernd_afa on 22-Sep-2009 at 06:28 PM. Last edited by bernd_afa on 22-Sep-2009 at 06:26 PM. Last edited by bernd_afa on 22-Sep-2009 at 06:26 PM.
|
|
Status: Offline |
|
|
mike
| |
Re: Interesting memory allocation benchmark Posted on 23-Sep-2009 15:03:53
| | [ #87 ] |
|
|
|
Regular Member |
Joined: 31-Jul-2007 Posts: 406
From: Alpha Centauri | | |
|
| There doesnt seem to be any slowdown on gcc x86 . i tested and retested again, results are the same for all versions.
mike@nellyphant:~/bench$ ./amemtest2 40960 1 3000000 3000000 iterations: Elapsed time: 275414 us (0.275414 s), 0.091805 us per mike@nellyphant:~/bench$ ./amemtest2 40960 100 3000000 3000000 iterations: Elapsed time: 832398 us (0.832398 s), 0.277466 us per mike@nellyphant:~/bench$ ./amemtest2 40960 1000 3000000 3000000 iterations: Elapsed time: 900612 us (0.900612 s), 0.300204 us per mike@nellyphant:~/bench$ ./amemtest2 40960 10000 3000000 3000000 iterations: Elapsed time: 1886320 us (1.886320 s), 0.628773 us per
mike@nellyphant:~/bench$ gcc-2.95 amemtest2.c -o amemtest2-295 -O2 gcc-2.95 mike@nellyphant:~/bench$ ./amemtest2-295 40960 1 3000000 3000000 iterations: Elapsed time: 356774 us (0.356774 s), 0.118925 us per mike@nellyphant:~/bench$ ./amemtest2-295 40960 1 3000000 3000000 iterations: Elapsed time: 342401 us (0.342401 s), 0.114134 us per mike@nellyphant:~/bench$ gcc-3.3 amemtest2.c -o amemtest2-333 -O2 gcc-3.3 mike@nellyphant:~/bench$ gcc-3.3 amemtest2.c -o amemtest2-333 -O2 mike@nellyphant:~/bench$ ./amemtest2-333 40960 1 3000000 3000000 iterations: Elapsed time: 360595 us (0.360595 s), 0.120198 us per mike@nellyphant:~/bench$ ./amemtest2-333 40960 1 3000000 3000000 iterations: Elapsed time: 362392 us (0.362392 s), 0.120797 us permike@nellyphant:~/bench$ ./amemtest2-41 40960 1 3000000 3000000 iterations: Elapsed time: 334351 us (0.334351 s), 0.111450 us per mike@nellyphant:~/bench$ ./amemtest2-41 40960 1 3000000 3000000 iterations: Elapsed time: 338222 us (0.338222 s), 0.112741 us per
Last edited by mike on 23-Sep-2009 at 03:20 PM. Last edited by mike on 23-Sep-2009 at 03:13 PM. Last edited by mike on 23-Sep-2009 at 03:11 PM. Last edited by mike on 23-Sep-2009 at 03:09 PM. Last edited by mike on 23-Sep-2009 at 03:06 PM.
_________________ C= Amiga addict ,,, (Oo) ⎛☮ໄ ﮑὠՀ Couldn't care less what other people think, seeing that there's concrete evidence they don't. |
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 24-Sep-2009 19:43:58
| | [ #88 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| Wy tlsfmem is slower on my winuae as on Peg 1 GHZ is because the 68k Version use lots of bfxxxx instructions.such instructions the UAE JIT not support and they are slow execute on interpreter.I think on 68060 they are also not very fast.
to get faster programs for winuae and real 68k its usefull to compile with -mnobitfield option.
Last edited by bernd_afa on 24-Sep-2009 at 07:44 PM.
|
|
Status: Offline |
|
|
wawa
| |
Re: Interesting memory allocation benchmark Posted on 24-Sep-2009 20:05:36
| | [ #89 ] |
|
|
|
Elite Member |
Joined: 21-Jan-2008 Posts: 6259
From: Unknown | | |
|
| @bernd_afa
then get the source from chris and optimize it;D |
|
Status: Offline |
|
|
mike
| |
Re: Interesting memory allocation benchmark Posted on 24-Sep-2009 21:34:08
| | [ #90 ] |
|
|
|
Regular Member |
Joined: 31-Jul-2007 Posts: 406
From: Alpha Centauri | | |
|
| @bernd_afa
tlsfmem allocatorbenchmark results without a startup-sequence
allocatorbenchmark RANDOMLY Testing overhead: 0.181 secs. Time: 11.481 secs for 50000 Allocs 47146 Free Ops. Largest: 3537464
allocatorbenchmark Testing overhead: 0.185 secs. Time: 1.407 secs for 50000 Allocs 49654 Free Ops. Largest: 449584
With tlsfmem allocatorbenchmark RANDOMLY Testing overhead: 0.181 secs. Time: 2.628 secs for 50000 Allocs 48210 Free Ops. Largest: 2384784
allocatorbenchmark Testing overhead: 0.185 secs. Time: 2.429 secs for 50000 Allocs 49644 Free Ops. Largest: 528080
Anybody got an 040 handy?
Btw, didnt you get the source already? Or rather permission to use a disassembly?
(doing membench now )
Ok now im successfully confused....
http://i35.tinypic.com/2vjput5.png
Last edited by mike on 24-Sep-2009 at 09:46 PM. Last edited by mike on 24-Sep-2009 at 09:38 PM. Last edited by mike on 24-Sep-2009 at 09:35 PM.
_________________ C= Amiga addict ,,, (Oo) ⎛☮ໄ ﮑὠՀ Couldn't care less what other people think, seeing that there's concrete evidence they don't. |
|
Status: Offline |
|
|
matthey
| |
Re: Interesting memory allocation benchmark Posted on 25-Sep-2009 4:35:04
| | [ #91 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2007 Posts: 2008
From: Kansas | | |
|
| @bernd_afa
The BFFFO instruction is one of the key elements to the constant time of TLSFmem. It finds the first bit set in a field. With and and branch instructions, the BFFFO instruction could be replaced but I think it would be slower on average on the 68060. BFFFO is 9 cycles and POEP only but one mis-predicted branch is 7 cycles. The problem is that C compilers use the bit field instructions a lot of times where they shouldn't. They can be faster on the 68060 and they are powerful. They should be used on the 68040 for fastest performance. UAE moved more to a RISC philosophy by giving up the less used instructions. You say branches are not so fast either on UAE. That doesn't leave any choices if you want TLSFmem to be fast on UAE.
|
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 26-Sep-2009 13:50:46
| | [ #92 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| i can reassemble tlsf mem, but i do nothing yet, there is also a update(1.6) of tlsfmem in the luciferin archive Chris do
there is also poolmem on aminet, that give more constant values my system need for 100000 calls then 6,3-6,9 sec.
but here stand too in the docu it do not work with memtracker. so there need one compile to allloc mem in clean solution and mung it.
I still dont understand wy such memtracker tools do not work well as stand in the readme.
ok its not correct fill on start, because there is no memlist.but then nothing is fill. but there should be no crash possible.
wipeout do on freemem first fill the memory with values and then release. or maybe UAE have a bug with the bfffo instruction.
can somebody test tlsfmem and wipeout if it work well on classic ?
when i start YAM it crash.crashes happen often when i use MUI programs, but only when i use wipeout. Last edited by bernd_afa on 26-Sep-2009 at 01:52 PM. Last edited by bernd_afa on 26-Sep-2009 at 01:51 PM.
|
|
Status: Offline |
|
|
mike
| |
Re: Interesting memory allocation benchmark Posted on 26-Sep-2009 18:12:29
| | [ #93 ] |
|
|
|
Regular Member |
Joined: 31-Jul-2007 Posts: 406
From: Alpha Centauri | | |
|
| @bernd_afa
Wipeout crashes my amiga
_________________ C= Amiga addict ,,, (Oo) ⎛☮ໄ ﮑὠՀ Couldn't care less what other people think, seeing that there's concrete evidence they don't. |
|
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 27-Sep-2009 16:05:42
| | [ #94 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| I take a look on the newlib code.There is a link that explain the memory allocator they use.
http://g.oswego.edu/dl/html/malloc.html
So this explain wy afxgroup get good speed on OS4.it handle memalloc most not over the OS4 mem system.
newlib is a bsd lib simular as ixemul, but because newlib is a strip down version for embedded systems some is miss.
dont know if this mem system is better or worser than the buddy allocater in ixemul is used. this also contain debug setting.maybe its faster in bebug mode as tlsfmem and wipeout and get too same mem security as wipeout
I think for best compatibility its best to only add to malloc/free a fast mem system.a amiga OS Coder that need fast speed can too use malloc functions then, when this is in libnix or ixemul Last edited by bernd_afa on 27-Sep-2009 at 04:12 PM. Last edited by bernd_afa on 27-Sep-2009 at 04:08 PM. Last edited by bernd_afa on 27-Sep-2009 at 04:07 PM.
|
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Interesting memory allocation benchmark Posted on 27-Sep-2009 17:04:34
| | [ #95 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| |
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 27-Sep-2009 18:03:45
| | [ #96 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @NutsAboutAmiga
>Not at all, every thing has to go trow the OS4 memory system some how.
there are only large blocks alloc and very selden alloc from OS.And thats the reason wy speed on OS4 speed grow.
Same do ixemul too.
when you use malloc on ixemul and C++ programms or on this benchmark malloc and free is much faster as when use the original amiga AllocMem functions. |
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Interesting memory allocation benchmark Posted on 27-Sep-2009 18:05:48
| | [ #97 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| |
Status: Offline |
|
|
bernd_afa
| |
Re: Interesting memory allocation benchmark Posted on 28-Sep-2009 9:32:15
| | [ #98 ] |
|
|
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @NutsAboutAmiga
>Memory comes frome the same free pool, it has to be done by the OS, >does it not?
malloc mem allocators alloc a large block and split this block into smaller blocks.If a program need only alloc some mem and free it frequently, this is done intern in malloc funcs.
AOS libnix use also poolmem, this create a mempool for a program and alloc size 4 kb.every time it need more mem.
when a programm alloc 100*32 bytes, then there is only 3200 bytes need and no global alloc is done.but AOS poolmem have too fragmentation problems, but they are not notice in this mem benchmark.
On my winuae system 100000 mallocs need only 0,6 sec.
newlib use the Doug Lee mem alloc or in short called dlmalloc some kind of caching.
Please read the text about caching in the newlib allocator.here you can see, that malloc call selden OS4 memalloc.
http://g.oswego.edu/dl/html/malloc.html
"""" Caching In the most straightforward version of the basic algorithm, each freed chunk is immediately coalesced with neighbors to form the largest possible unused chunk. Similarly, chunks are created (by splitting larger chunks) only when explicitly requested.
Operations to split and to coalesce chunks take time. This time overhead can sometimes be avoided by using either of both of two caching strategies:
Deferred Coalescing Rather than coalescing freed chunks, leave them at their current sizes in hopes that another request for the same size will come along soon. This saves a coalesce, a later split, and the time it would take to find a non-exactly-matching chunk to split. Preallocation Rather than splitting out new chunks one-by one, pre-split many at once. This is normally faster than doing it one-at-a-time. Because the basic data structures in the allocator permit coalescing at any time, in any of malloc, free, or realloc, corresponding caching heuristics are easy to apply. The effectiveness of caching obviously depends on the costs of splitting, coalescing, and searching relative to the work needed to track cached chunks. Additionally, effectiveness less obviously depends on the policy used in deciding when to cache versus coalesce them. .
Caching can be a good idea in programs that continuously allocate and release chunks of only a few sizes. For example, if you write a program that allocates and frees many tree nodes, .... """"
Last edited by bernd_afa on 28-Sep-2009 at 09:32 AM.
|
|
Status: Offline |
|
|
Tomppeli
| |
Re: Interesting memory allocation benchmark Posted on 28-Sep-2009 15:13:10
| | [ #99 ] |
|
|
|
Super Member |
Joined: 18-Jun-2004 Posts: 1652
From: Home land of Santa, sauna, sisu and salmiakki | | |
|
| @thread
(Edit 2...) My personal opinion on debate of different memory sub systems. As long as there's not full memory protection in the system buggy apps forces the end user to reboot frequently enough so memory fragmentation doesn't matter. But when there's full memory protection and end users expects to use their computers 24/7/52/year then low fragmention and ability to defragment memory automaticly, then any system causing low fragmentation is worth more than its weight in gold.
Last edited by Tomppeli on 28-Sep-2009 at 09:03 PM. Last edited by Tomppeli on 28-Sep-2009 at 03:20 PM. Last edited by Tomppeli on 28-Sep-2009 at 03:15 PM.
_________________ Rock lobster bit me. My Workbench has always preferences. X1000 + AmigaOS4.1 FE "Anyone can build a fast CPU. The trick is to build a fast system." -Seymour Cray |
|
Status: Offline |
|
|
Fab
| |
Re: Interesting memory allocation benchmark Posted on 28-Sep-2009 15:22:57
| | [ #100 ] |
|
|
|
Super Member |
Joined: 17-Mar-2004 Posts: 1178
From: Unknown | | |
|
| @Tomppeli
Not exactly sure what you mean, but the results i got perfectly matched the actual spent time. And i was never able to complete the 1000000 iteration test on OS4 even though i waited more than 30 minutes :) |
|
Status: Offline |
|
|