Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6223 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

lionstorm: 42 mins ago

minator: 49 mins ago

matthey: 1 hr 16 mins ago

zipper: 1 hr 44 mins ago

amiwell: 1 hr 47 mins ago

Beajar: 2 hrs 34 mins ago

ppcamiga1: 2 hrs 36 mins ago

Karlos: 2 hrs 53 mins ago

mordock: 3 hrs 1 min ago

Dragster: 3 hrs 5 mins ago

Forum Index

Amiga OS4.x \ Workbench 4.x

Interesting memory allocation benchmark

Poster

Thread

umisef

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 15:43:03

[ #81 ]

Super Member

Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@mike

Quote:
What changes did you make to the mac compile?

Same as for the linux compiles I mentioned earlier --- use "malloc()" and "free()" for allocation/deallocation, and gettimeofday() for timing.

In fact, I simply took the linux-ified source and compiled it without any modification; I only remembered the G4 was running MacOS (rather than linux) when I tried looking at the details of the CPU, and the (linux-specific) /proc/cpuinfo wasn't there.

Quote:
Would be interesting to run this on linux 68k as well.

If you have an installation running, knock yourself out. The linux-adapted source is here.
Also, it would be nice to compare OS4/MorphOS/linux on the same Peg2 :)

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 16:33:36

[ #82 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

@Seiya

>1000 = 280001 us
>2000 = 599999 us
>10000 = 2840000 us

Your values are horrible bad.

I boot amikit and get not so good values, but they are lots faster as your values.
The reason wy there is slowdown, is the patch memtrailer 96 that is start in startup-sequence and add on every mem alloc a 96 byte bound.

when you remove it, or change the 96 to 8 then the bench get faster.

But all in all, the speed on AOS memsystem depend lot on luck.
i do the test after boot start several programs, neasure , close it measure , and get diffrent values from 2 sec upto 9.3 sec. AMD64 3000+ (1,8 GHZ clock) system with winuae is lots faster as OS4 or MOS Systems of course.Can see when compare values with MOS1.4 mem system.need 16 sec

the fastest value i get after longest use

Neuer Shell-Prozeß 6
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 3739999 µs
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 9340000 µs
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 3779996 µs
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 8499994 µs
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 4739998 µs
6.AmiKit:> NEW48G:amiga/AmiDevCpp/bernd/testprog/test.exe 100000
Elapsed time: 2000001 µs
6.AmiKit:>

But MOS tlsfmem seem better as 68k tlsfmem.when i use that i get not so much speedup and test is execute always in 1.9-2.1 sec.

but ok, winuae does not like code that execute only 2 instructions and then branch.
Here can enhance the jit so that such few instruction loops do not check if a chipset event occur, but your PC must be lots faster

Last edited by bernd_afa on 22-Sep-2009 at 04:37 PM.
Last edited by bernd_afa on 22-Sep-2009 at 04:36 PM.
Last edited by bernd_afa on 22-Sep-2009 at 04:35 PM.

Status: Offline

Interesting

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 16:35:23

[ #83 ]

Super Member

Joined: 29-Mar-2004
Posts: 1812
From: a place & time long long ago, when things mattered.

@umisef

Quote:
1.25GHz G4, 512k L2, 167MHz FSB, MacOS X Leopard

thx, now for the review.

_________________
"The system no longer works " -- Young Anakin Skywalker

Status: Offline

itix

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 16:44:45

[ #84 ]

Elite Member

Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@bernd_afa

Quote:

wipeout check not every memallac/free memory.there need wipeout check called or a period add so it check all.

It checks (unless disabled from commandline options) if the allocation list is consistent before performing allocation or deallocation. It does not check for the pre/postwall damage.

Quote:

Have you test on your Peg what the memtest values are when you use wipeout ?

It gets very slow. Surprisingly slow.

Quote:

TLSF+Wipeout:
Varasto:Lähdekoodit/membench> membench 1000
Elapsed time: 21960671 µs
Varasto:Lähdekoodit/membench> membench 2000
Elapsed time: 46136320 µs
Varasto:Lähdekoodit/membench> membench 1000
Elapsed time: 23054232 µs
Varasto:Lähdekoodit/membench> membench 1000
Elapsed time: 23104360 µs
Varasto:Lähdekoodit/membench> membench 1000
Elapsed time: 23113834 µs

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

Status: Offline

afxgroup

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 16:53:05

[ #85 ]

Super Member

Joined: 8-Mar-2004
Posts: 1968
From: Taranto, Italy

@all

just for fun i've tested it on OS4.1 on my slow A1 G4-933 mhz... using malloc()/free()

Quote:

1000 iterations: Elapsed time: 16162 us (0.016162 s)
2000 iterations: Elapsed time: 32294 us (0.032294 s)
10000 iterations: Elapsed time: 161292 us (0.161292 s)
50000 iterations: Elapsed time: 809667 us (0.809667 s)
100000 iterations: Elapsed time: 1619899 us (1.619899 s)
1000000 iterations: Elapsed time: 16170085 us (16.170085 s)

......and so?
but anyone has tried this yet on his A1?!?!

_________________
http://www.amigasoft.net

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 22-Sep-2009 17:09:53

[ #86 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

@itix

Your values are over 20* slower as my speed result, i think you use the test with the large mem from fab.thats possible that it take long, because the test use fill much mem and get cache misses.but in real world luckily most frequent mem allocs are of small size.

I look in wipeout docu, and see , that consistencycheck is off by default.I verify it and test speed when set on, then 1000 iterations need 120 sec and also my fast winuae crawl on all operations.

CONSISTENCECHECK These options control whether Wipeout will run a
NOCONSISTENCECHECK consistency check on all its memory tracking data
structures before performing memory deallocations and
tracked allocations. This test may slow down the operation
of the Amiga, especially on already very slow machines.
Still, it is required to assure proper operation of the
program; without consistent data structures, Wipeout will
crash or fail to perform correctly.
Default is NOCONSISTENCECHECK.

>@all

>just for fun i've tested it on OS4.1 on my slow A1 G4-933 mhz... using malloc()/free()

>Quote:

>1000 iterations: Elapsed time: 16162 us (0.016162 s)
>2000 iterations: Elapsed time: 32294 us (0.032294 s)
>10000 iterations: Elapsed time: 161292 us (0.161292 s)
>50000 iterations: Elapsed time: 809667 us (0.809667 s)
>100000 iterations: Elapsed time: 1619899 us (1.619899 s)
>1000000 iterations: Elapsed time: 16170085 us (16.170085 s)

>......and so?
>but anyone has tried this yet on his A1?!?!

maybe the malloc/ free implementation use poolmem or other allocator.
when i compile this test for ixemul poolmem or libnix and use malloc, then it run 100000 in 0.6 sec.

but thats of course not realistic, because there is a fresh pool that contain only 20 mem entries.

when i use ixemul and buddy allocator test take 10 sec, is lots slower, but netsurf run lots faster with buddy allocator

Last edited by bernd_afa on 22-Sep-2009 at 05:29 PM.
Last edited by bernd_afa on 22-Sep-2009 at 05:28 PM.
Last edited by bernd_afa on 22-Sep-2009 at 05:26 PM.
Last edited by bernd_afa on 22-Sep-2009 at 05:26 PM.

Status: Offline

mike

Re: Interesting memory allocation benchmark
Posted on 23-Sep-2009 14:03:53

[ #87 ]

Regular Member

Joined: 31-Jul-2007
Posts: 406
From: Alpha Centauri

There doesnt seem to be any slowdown on gcc x86 . i tested and retested again, results are the same for all versions.

mike@nellyphant:~/bench$ ./amemtest2 40960 1 3000000
3000000 iterations: Elapsed time: 275414 us (0.275414 s), 0.091805 us per
mike@nellyphant:~/bench$ ./amemtest2 40960 100 3000000
3000000 iterations: Elapsed time: 832398 us (0.832398 s), 0.277466 us per
mike@nellyphant:~/bench$ ./amemtest2 40960 1000 3000000
3000000 iterations: Elapsed time: 900612 us (0.900612 s), 0.300204 us per
mike@nellyphant:~/bench$ ./amemtest2 40960 10000 3000000
3000000 iterations: Elapsed time: 1886320 us (1.886320 s), 0.628773 us per

mike@nellyphant:~/bench$ gcc-2.95 amemtest2.c -o amemtest2-295 -O2
gcc-2.95
mike@nellyphant:~/bench$ ./amemtest2-295 40960 1 3000000
3000000 iterations: Elapsed time: 356774 us (0.356774 s), 0.118925 us per
mike@nellyphant:~/bench$ ./amemtest2-295 40960 1 3000000
3000000 iterations: Elapsed time: 342401 us (0.342401 s), 0.114134 us per
mike@nellyphant:~/bench$ gcc-3.3 amemtest2.c -o amemtest2-333 -O2
gcc-3.3
mike@nellyphant:~/bench$ gcc-3.3 amemtest2.c -o amemtest2-333 -O2
mike@nellyphant:~/bench$ ./amemtest2-333 40960 1 3000000
3000000 iterations: Elapsed time: 360595 us (0.360595 s), 0.120198 us per
mike@nellyphant:~/bench$ ./amemtest2-333 40960 1 3000000
3000000 iterations: Elapsed time: 362392 us (0.362392 s), 0.120797 us permike@nellyphant:~/bench$ ./amemtest2-41 40960 1 3000000
3000000 iterations: Elapsed time: 334351 us (0.334351 s), 0.111450 us per
mike@nellyphant:~/bench$ ./amemtest2-41 40960 1 3000000
3000000 iterations: Elapsed time: 338222 us (0.338222 s), 0.112741 us per

Last edited by mike on 23-Sep-2009 at 02:20 PM.
Last edited by mike on 23-Sep-2009 at 02:13 PM.
Last edited by mike on 23-Sep-2009 at 02:11 PM.
Last edited by mike on 23-Sep-2009 at 02:09 PM.
Last edited by mike on 23-Sep-2009 at 02:06 PM.

_________________
C= Amiga addict
,,,
(Oo)
⎛☮ໄ
ﮑὠՀ
Couldn't care less what other people think, seeing that there's concrete evidence they don't.

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 24-Sep-2009 18:43:58

[ #88 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

Wy tlsfmem is slower on my winuae as on Peg 1 GHZ is because the 68k Version use lots of bfxxxx instructions.such instructions the UAE JIT not support and they are slow execute on interpreter.I think on 68060 they are also not very fast.

to get faster programs for winuae and real 68k its usefull to compile with
-mnobitfield option.

Last edited by bernd_afa on 24-Sep-2009 at 06:44 PM.

Status: Offline

wawa

Re: Interesting memory allocation benchmark
Posted on 24-Sep-2009 19:05:36

[ #89 ]

Elite Member

Joined: 21-Jan-2008
Posts: 6259
From: Unknown

@bernd_afa

then get the source from chris and optimize it;D

Status: Offline

mike

Re: Interesting memory allocation benchmark
Posted on 24-Sep-2009 20:34:08

[ #90 ]

Regular Member

Joined: 31-Jul-2007
Posts: 406
From: Alpha Centauri

@bernd_afa

tlsfmem allocatorbenchmark results without a startup-sequence

allocatorbenchmark RANDOMLY
Testing overhead: 0.181 secs.
Time: 11.481 secs for 50000 Allocs 47146 Free Ops. Largest: 3537464

allocatorbenchmark
Testing overhead: 0.185 secs.
Time: 1.407 secs for 50000 Allocs 49654 Free Ops. Largest: 449584

With tlsfmem
allocatorbenchmark RANDOMLY
Testing overhead: 0.181 secs.
Time: 2.628 secs for 50000 Allocs 48210 Free Ops. Largest: 2384784

allocatorbenchmark
Testing overhead: 0.185 secs.
Time: 2.429 secs for 50000 Allocs 49644 Free Ops. Largest: 528080

Anybody got an 040 handy?

Btw, didnt you get the source already?
Or rather permission to use a disassembly?

(doing membench now )

Ok now im successfully confused....

http://i35.tinypic.com/2vjput5.png

Last edited by mike on 24-Sep-2009 at 08:46 PM.
Last edited by mike on 24-Sep-2009 at 08:38 PM.
Last edited by mike on 24-Sep-2009 at 08:35 PM.

_________________
C= Amiga addict
,,,
(Oo)
⎛☮ໄ
ﮑὠՀ
Couldn't care less what other people think, seeing that there's concrete evidence they don't.

Status: Offline

matthey

Re: Interesting memory allocation benchmark
Posted on 25-Sep-2009 3:35:04

[ #91 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2750
From: Kansas

@bernd_afa

The BFFFO instruction is one of the key elements to the constant time of TLSFmem. It finds the first bit set in a field. With and and branch instructions, the BFFFO instruction could be replaced but I think it would be slower on average on the 68060. BFFFO is 9 cycles and POEP only but one mis-predicted branch is 7 cycles. The problem is that C compilers use the bit field instructions a lot of times where they shouldn't. They can be faster on the 68060 and they are powerful. They should be used on the 68040 for fastest performance. UAE moved more to a RISC philosophy by giving up the less used instructions. You say branches are not so fast either on UAE. That doesn't leave any choices if you want TLSFmem to be fast on UAE.

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 26-Sep-2009 12:50:46

[ #92 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

i can reassemble tlsf mem, but i do nothing yet, there is also a update(1.6) of tlsfmem in the luciferin archive Chris do

there is also poolmem on aminet, that give more constant values my system need for 100000 calls then 6,3-6,9 sec.

but here stand too in the docu it do not work with memtracker.
so there need one compile to allloc mem in clean solution and mung it.

I still dont understand wy such memtracker tools do not work well as stand in the readme.

ok its not correct fill on start, because there is no memlist.but then nothing is fill.
but there should be no crash possible.

wipeout do on freemem first fill the memory with values and then release.
or maybe UAE have a bug with the bfffo instruction.

can somebody test tlsfmem and wipeout if it work well on classic ?

when i start YAM it crash.crashes happen often when i use MUI programs, but only when i use wipeout.

Last edited by bernd_afa on 26-Sep-2009 at 12:52 PM.
Last edited by bernd_afa on 26-Sep-2009 at 12:51 PM.

Status: Offline

mike

Re: Interesting memory allocation benchmark
Posted on 26-Sep-2009 17:12:29

[ #93 ]

Regular Member

Joined: 31-Jul-2007
Posts: 406
From: Alpha Centauri

@bernd_afa

Wipeout crashes my amiga

_________________
C= Amiga addict
,,,
(Oo)
⎛☮ໄ
ﮑὠՀ
Couldn't care less what other people think, seeing that there's concrete evidence they don't.

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 27-Sep-2009 15:05:42

[ #94 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

I take a look on the newlib code.There is a link that explain the memory allocator they use.

http://g.oswego.edu/dl/html/malloc.html

So this explain wy afxgroup get good speed on OS4.it handle memalloc most not over the OS4 mem system.

newlib is a bsd lib simular as ixemul, but because newlib is a strip down version for embedded systems some is miss.

dont know if this mem system is better or worser than the buddy allocater in ixemul is used.
this also contain debug setting.maybe its faster in bebug mode as tlsfmem and wipeout and get too same mem security as wipeout

I think for best compatibility its best to only add to malloc/free a fast mem system.a amiga OS Coder that need fast speed can too use malloc functions then, when this is in libnix or ixemul

Last edited by bernd_afa on 27-Sep-2009 at 03:12 PM.
Last edited by bernd_afa on 27-Sep-2009 at 03:08 PM.
Last edited by bernd_afa on 27-Sep-2009 at 03:07 PM.

Status: Offline

NutsAboutAmiga

Re: Interesting memory allocation benchmark
Posted on 27-Sep-2009 16:04:34

[ #95 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@bernd_afa

Not at all, every thing has to go trow the OS4 memory system some how.

But afxgroup might have newer kernel, then most, I bet they have head attention on benchmarks from the first MOS vs OS4 benchmark shown.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 27-Sep-2009 17:03:45

[ #96 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

@NutsAboutAmiga

>Not at all, every thing has to go trow the OS4 memory system some how.

there are only large blocks alloc and very selden alloc from OS.And thats the reason wy speed on OS4 speed grow.

Same do ixemul too.

when you use malloc on ixemul and C++ programms or on this benchmark
malloc and free is much faster as when use the original amiga AllocMem functions.

Status: Offline

NutsAboutAmiga

Re: Interesting memory allocation benchmark
Posted on 27-Sep-2009 17:05:48

[ #97 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@bernd_afa

Memory comes frome the same free pool, it has to be done by the OS, does it not?

Last edited by NutsAboutAmiga on 27-Sep-2009 at 05:08 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

bernd_afa

Re: Interesting memory allocation benchmark
Posted on 28-Sep-2009 8:32:15

[ #98 ]

Cult Member

Joined: 14-Apr-2006
Posts: 829
From: Unknown

@NutsAboutAmiga

>Memory comes frome the same free pool, it has to be done by the OS, >does it not?

malloc mem allocators alloc a large block and split this block into smaller blocks.If a program need only alloc some mem and free it frequently, this is done intern in malloc funcs.

AOS libnix use also poolmem, this create a mempool for a program and alloc size 4 kb.every time it need more mem.

when a programm alloc 100*32 bytes, then there is only 3200 bytes need and no global alloc is done.but AOS poolmem have too fragmentation problems, but they are not notice in this mem benchmark.

On my winuae system 100000 mallocs need only 0,6 sec.

newlib use the Doug Lee mem alloc or in short called dlmalloc some kind of caching.

Please read the text about caching in the newlib allocator.here you can see, that malloc call selden OS4 memalloc.

http://g.oswego.edu/dl/html/malloc.html

""""
Caching
In the most straightforward version of the basic algorithm, each freed chunk is immediately coalesced with neighbors to form the largest possible unused chunk. Similarly, chunks are created (by splitting larger chunks) only when explicitly requested.

Operations to split and to coalesce chunks take time. This time overhead can sometimes be avoided by using either of both of two caching strategies:

Deferred Coalescing
Rather than coalescing freed chunks, leave them at their current sizes in hopes that another request for the same size will come along soon. This saves a coalesce, a later split, and the time it would take to find a non-exactly-matching chunk to split.
Preallocation
Rather than splitting out new chunks one-by one, pre-split many at once. This is normally faster than doing it one-at-a-time.
Because the basic data structures in the allocator permit coalescing at any time, in any of malloc, free, or realloc, corresponding caching heuristics are easy to apply.
The effectiveness of caching obviously depends on the costs of splitting, coalescing, and searching relative to the work needed to track cached chunks. Additionally, effectiveness less obviously depends on the policy used in deciding when to cache versus coalesce them. .

Caching can be a good idea in programs that continuously allocate and release chunks of only a few sizes. For example, if you write a program that allocates and frees many tree nodes,
....
""""

Last edited by bernd_afa on 28-Sep-2009 at 08:32 AM.

Status: Offline

Tomppeli

Re: Interesting memory allocation benchmark
Posted on 28-Sep-2009 14:13:10

[ #99 ]

Super Member

Joined: 18-Jun-2004
Posts: 1653
From: Home land of Santa, sauna, sisu and salmiakki

@thread

(Edit 2...)
My personal opinion on debate of different memory sub systems. As long as there's not full memory protection in the system buggy apps forces the end user to reboot frequently enough so memory fragmentation doesn't matter. But when there's full memory protection and end users expects to use their computers 24/7/52/year then low fragmention and ability to defragment memory automaticly, then any system causing low fragmentation is worth more than its weight in gold.

Last edited by Tomppeli on 28-Sep-2009 at 08:03 PM.
Last edited by Tomppeli on 28-Sep-2009 at 02:20 PM.
Last edited by Tomppeli on 28-Sep-2009 at 02:15 PM.

_________________
Rock lobster bit me. My Workbench has always preferences. X1000 + AmigaOS4.1 FE
"Anyone can build a fast CPU. The trick is to build a fast system." -Seymour Cray

Status: Offline

Fab

Re: Interesting memory allocation benchmark
Posted on 28-Sep-2009 14:22:57

[ #100 ]

Super Member

Joined: 17-Mar-2004
Posts: 1178
From: Unknown

@Tomppeli

Not exactly sure what you mean, but the results i got perfectly matched the actual spent time. And i was never able to complete the 1000000 iteration test on OS4 even though i waited more than 30 minutes :)

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle