Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 27-Feb-2012 14:09:19
| | [ #121 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
You can also use it in the Wave Editor, that should work. You can use the pre-listen function, or, if not realtime possible, render it to a test sample, listen, and then undo.
At the beginning I was also aiming for a steeper filter, but people who are more knowladgable than me told me to use less steepness.
The reasons are the problems with sound events at the filter-band bounaries, and the ringing instroduced by IIR filters. The steeper the filter, the higher the ringing. you can only use more coefficients for the filter, but then you also need more CPU power.
In many situations those things won't be noticable, of course. But if the filters are accuarate and the X-frqeunecy-jumping problem is lowered, I simply have a better concious to run the effect over my precious master recording...
If I have time I will experiment with steeper filter. Till now, I had quite good results with the 12db filters. _________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 27-Feb-2012 14:49:59
| | [ #122 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
and the ringing instroduced by IIR filters.
|
I have Jamin set use use FFT filters (which is the default) It's an awfully long time since I studied digital filters at univeristy so can't remeber the difference, but could this be a factor in producing a less distorted steep filter?
Anyway I'm certainly no exprt in this field, I will keep trying it out when I need to master a video sound check etc Because the less often I have to switch my linux box on the better!
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 27-Feb-2012 15:57:53
| | [ #123 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
Yes, FFT allows you to do very steep filters with low ringing (e.g. if proper Hanning windowed).
But that is too heavy for your SAM. See the FFT benchmarks in this thread. Even on WinUAE, I would not want the Multiband compresser eat up more than - lets say - 10% of my CPU, since I have plenty of other stuff running. I used the FFT only in the Denoiser, since there it is essential, plus, this is a typical "offline" effect that you render into the wave sample.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 4-Mar-2012 22:04:48
| | [ #124 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @Thread
I have written an interesting benchmark here: FFTDemo
It is basically the same like the FFT demo before posted by Bernd, but it is available as Amiblitz and C code. I compiled binaries for Windows and 68K Amiga. There is an additional test number that is a checksum of the FFT calculations. This ensures that the FFT must be actually computed and not bypassed.
Here are my (surprising, or not so surprising results):
Test Results on my Laptop (i5@2.5GHz):
WinUAE + GCC: Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 3620ms for 4096000 samples, => 12.83x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
WinUAE + AmiBlitz3 Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float) Time needed 2577ms for 4096000 samples, => 18.02x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)
Win32: Speed test for FFT + iFFT: (C, x86/Win32, float) Time needed 223ms for 4096000 samples, => 208.25x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
What does that mean?
1. WinUAE wastes a factor of at least 10(!) when running 68K Code vs. native.
2. GCC is slower than Amiblitz, or I misconfigured the optimizer. Both, GCC and Amiblitz have a quite naive implementation of the FFT, no special actions were taken to make it particularly fast on the code level.
3. You can easily create PPC native version of this test for MOS and OS4 and see how fast you can get native, and how fast the 68K emu is.
The results are consistent to the test before, the only difference is that I test 10x more loops to be more accurate and it tests mono FFT, means the RTF result is twice as fast because it runs on a single channel instead of two channels.
The 68K Amiga Version was compiled with this command line (GCC) gcc FFT.c FFTDemo.c -o2 -o ../FFTDemo_68K_GCC -m68040 -m68881
And with Amiblitz3: Amiblitz3 -s FFTDemo.ab3 -e /FFTDemo_68K_AB3 -release
The x86 Win32 Version was compiled with this command line (Visual Studio) /O2 /Ob2 /GL /D "WIN32" /FD /MD /fp:fast /W3 /nologo /c /Zi
Now, I am quite curious about PPC results, and if someone could explain why the GCC looses to badly againt Amiblitz3. What I know:
1. Amiblitz has FPU Register optimization, but GCC should have this too, no? 2. Amiblitz does not have array indexing and must compute the memory offset of the complex floats manually. I would have assumed that this would be slower.
Last edited by Wanderer on 04-Mar-2012 at 10:12 PM. Last edited by Wanderer on 04-Mar-2012 at 10:10 PM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | bernd_afa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 11:40:54
| | [ #125 ] |
| |
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Wanderer
can you post the X86 asm output of Visual C of the main CPU cylce function ?.You can also do a screenshot of VC debugger in asm output mode, that show the asm code of this function that is mostly process.
Its this right ?
void _fftD(fftH *fft, fftCD *buffer, int inverse) {
i dont think that native X86 can be so much faster.also the X86 file is very small. best you do same as you do in your fft include and print some values that show, that the code is correct execute.not that Visual C do a optimizer trick because it know your code is not usefull
but when you post asm output can see too what asm instructions are produce
Last edited by bernd_afa on 06-Mar-2012 at 11:49 AM. Last edited by bernd_afa on 06-Mar-2012 at 11:42 AM. Last edited by bernd_afa on 06-Mar-2012 at 11:41 AM. Last edited by bernd_afa on 06-Mar-2012 at 11:41 AM.
|
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 13:02:00
| | [ #126 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
The 68K Amiga Version was compiled with this command line (GCC) gcc FFT.c FFTDemo.c -o2 -o ../FFTDemo_68K_GCC -m68040 -m68881
|
-o2 for optimistaion? should that not be -O2 ?
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 15:45:04
| | [ #127 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @bernd_afa
Here is the x86 Code in Debug mode (I dont know how to display the disassembler in release mode)
void _fft(fftH *fft, fftCF *buffer, int inverse) { 004119C0 push ebp 004119C1 mov ebp,esp 004119C3 sub esp,74h 004119C6 push ebx 004119C7 push esi 004119C8 push edi
float ur,ui,wr,wi,tr,ti; int k,j,i;
int le = 0; 004119C9 mov dword ptr [le],0 int le2 = 1; 004119D0 mov dword ptr [le2],1 float w = ((float)M_PI); 004119D7 fld dword ptr [__real@40490fdb (415884h)] 004119DD fstp dword ptr [w] float df=1.0f; 004119E0 fld1 004119E2 fstp dword ptr [df] if (inverse) df = -1.0f; 004119E5 cmp dword ptr [inverse],0 004119E9 je _fft+34h (4119F4h) 004119EB fld dword ptr [__real@bf800000 (41587Ch)] 004119F1 fstp dword ptr [df]
for (k = 0; k < fft->order; k++) { 004119F4 mov dword ptr [k],0 004119FB jmp _fft+46h (411A06h) 004119FD mov eax,dword ptr [k] 00411A00 add eax,1 00411A03 mov dword ptr [k],eax 00411A06 mov eax,dword ptr [fft] 00411A09 mov ecx,dword ptr [k] 00411A0C cmp ecx,dword ptr [eax] 00411A0E jge _fft+1BEh (411B7Eh) le = le2; 00411A14 mov eax,dword ptr [le2] 00411A17 mov dword ptr [le],eax le2 <<= 1; 00411A1A mov eax,dword ptr [le2] 00411A1D shl eax,1 00411A1F mov dword ptr [le2],eax ur = 1.0f; 00411A22 fld1 00411A24 fstp dword ptr [ur] ui = 0.0f; 00411A27 fldz 00411A29 fstp dword ptr [ui]
wr = cosf(w); 00411A2C fld dword ptr [w] 00411A2F sub esp,8 00411A32 fstp qword ptr [esp] 00411A35 call @ILT+225(_cos) (4110E6h) 00411A3A add esp,8 00411A3D fstp dword ptr [wr] wi = sinf(w) * df; 00411A40 fld dword ptr [w] 00411A43 sub esp,8 00411A46 fstp qword ptr [esp] 00411A49 call @ILT+190(_sin) (4110C3h) 00411A4E add esp,8 00411A51 fmul dword ptr [df] 00411A54 fstp dword ptr [wi] w /= 2.0f; 00411A57 fld dword ptr [w] 00411A5A fdiv qword ptr [__real@4000000000000000 (415870h)] 00411A60 fstp dword ptr [w]
for (j = 0; j < le; j++) { 00411A63 mov dword ptr [j],0 00411A6A jmp _fft+0B5h (411A75h) 00411A6C mov eax,dword ptr [j] 00411A6F add eax,1 00411A72 mov dword ptr [j],eax 00411A75 mov eax,dword ptr [j] 00411A78 cmp eax,dword ptr [le] 00411A7B jge _fft+1B9h (411B79h) for (i = j; i < fft->npoints; i += le2) { 00411A81 mov eax,dword ptr [j] 00411A84 mov dword ptr [i],eax 00411A87 jmp _fft+0D2h (411A92h) 00411A89 mov eax,dword ptr [i] 00411A8C add eax,dword ptr [le2] 00411A8F mov dword ptr [i],eax 00411A92 mov eax,dword ptr [fft] 00411A95 mov ecx,dword ptr [i] 00411A98 cmp ecx,dword ptr [eax+4] 00411A9B jge _fft+18Ch (411B4Ch) tr = buffer[i+le].r * ur - buffer[i+le].i * ui; 00411AA1 mov eax,dword ptr [i] 00411AA4 add eax,dword ptr [le] 00411AA7 mov ecx,dword ptr [buffer] 00411AAA fld dword ptr [ecx+eax*8] 00411AAD fmul dword ptr [ur] 00411AB0 mov edx,dword ptr [i] 00411AB3 add edx,dword ptr [le] 00411AB6 mov eax,dword ptr [buffer] 00411AB9 fld dword ptr [eax+edx*8+4] 00411ABD fmul dword ptr [ui] 00411AC0 fsubp st(1),st 00411AC2 fstp dword ptr [tr] ti = buffer[i+le].r * ui + buffer[i+le].i * ur; 00411AC5 mov eax,dword ptr [i] 00411AC8 add eax,dword ptr [le] 00411ACB mov ecx,dword ptr [buffer] 00411ACE fld dword ptr [ecx+eax*8] 00411AD1 fmul dword ptr [ui] 00411AD4 mov edx,dword ptr [i] 00411AD7 add edx,dword ptr [le] 00411ADA mov eax,dword ptr [buffer] 00411ADD fld dword ptr [eax+edx*8+4] 00411AE1 fmul dword ptr [ur] 00411AE4 faddp st(1),st 00411AE6 fstp dword ptr [ti] buffer[i+le].r = buffer[i].r - tr; 00411AE9 mov eax,dword ptr [i] 00411AEC mov ecx,dword ptr [buffer] 00411AEF fld dword ptr [ecx+eax*8] 00411AF2 fsub dword ptr [tr] 00411AF5 mov edx,dword ptr [i] 00411AF8 add edx,dword ptr [le] 00411AFB mov eax,dword ptr [buffer] 00411AFE fstp dword ptr [eax+edx*8] buffer[i+le].i = buffer[i].i - ti; 00411B01 mov eax,dword ptr [i] 00411B04 mov ecx,dword ptr [buffer] 00411B07 fld dword ptr [ecx+eax*8+4] 00411B0B fsub dword ptr [ti] 00411B0E mov edx,dword ptr [i] 00411B11 add edx,dword ptr [le] 00411B14 mov eax,dword ptr [buffer] 00411B17 fstp dword ptr [eax+edx*8+4] buffer[i].r += tr; 00411B1B mov eax,dword ptr [i] 00411B1E mov ecx,dword ptr [buffer] 00411B21 fld dword ptr [ecx+eax*8] 00411B24 fadd dword ptr [tr] 00411B27 mov edx,dword ptr [i] 00411B2A mov eax,dword ptr [buffer] 00411B2D fstp dword ptr [eax+edx*8] buffer[i].i += ti; 00411B30 mov eax,dword ptr [i] 00411B33 mov ecx,dword ptr [buffer] 00411B36 fld dword ptr [ecx+eax*8+4] 00411B3A fadd dword ptr [ti] 00411B3D mov edx,dword ptr [i] 00411B40 mov eax,dword ptr [buffer] 00411B43 fstp dword ptr [eax+edx*8+4] } 00411B47 jmp _fft+0C9h (411A89h) tr = ur*wr - ui*wi; 00411B4C fld dword ptr [ur] 00411B4F fmul dword ptr [wr] 00411B52 fld dword ptr [ui] 00411B55 fmul dword ptr [wi] 00411B58 fsubp st(1),st 00411B5A fstp dword ptr [tr] ui = ur*wi + ui*wr; 00411B5D fld dword ptr [ur] 00411B60 fmul dword ptr [wi] 00411B63 fld dword ptr [ui] 00411B66 fmul dword ptr [wr] 00411B69 faddp st(1),st 00411B6B fstp dword ptr [ui] ur = tr; 00411B6E fld dword ptr [tr] 00411B71 fstp dword ptr [ur] } 00411B74 jmp _fft+0ACh (411A6Ch) } 00411B79 jmp _fft+3Dh (4119FDh) } 00411B7E pop edi 00411B7F pop esi 00411B80 pop ebx 00411B81 mov esp,ebp 00411B83 pop ebp 00411B84 ret
Quote:
Its this right ?
void _fftD(fftH *fft, fftCD *buffer, int inverse) {
|
This is the double float version of the FFT. I use the single precision which is called _fft().
This is the speed result of the debug version: Speed test for FFT + iFFT: (C, x86/Win32, float) Time needed 646ms for 4096000 samples, => 71.89x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
Still much faster than WinUAE. The x86 file is very small, I do believe it does a lot of dead code elemination. As I wrote above the FFT is truly executed, since it computes the cross-sum (1.5923). The optimized must be very clever to cheat this. (not impossible though).
Last edited by Wanderer on 06-Mar-2012 at 03:47 PM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 15:51:58
| | [ #128 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
I tried -O2. Now the optimizer is on. I dont know why -o2 didt throw me an error message?
Here is the Result:
Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 1620ms for 4096000 samples, => 28.67x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
More than two times faster. Now GCC wins over Amiblitz.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:13:43
| | [ #129 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
10.RAM Disk:FFTDemo> FFTDemo_PPC_GCC Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 19872ms for 4096000 samples, => 2.34x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
This was on my SAM-Flex 733 without any mofifications to the code just a straght recompile (hence it reports 68040 when it's really PPC)
here the amiblitz version on my SAM
10.RAM Disk:FFTDemo> FFTDemo_68K_AB3 Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float) Time needed 28561ms for 4096000 samples, => 1.63x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)
I can't run the 68k gcc version as I don't have any ixemul.library installed (Iused to have one I'll if I can find it)
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:26:36
| | [ #130 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| | Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:28:55
| | [ #131 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
Sorry, the check for 68040 is just the check for AMIGA define, I didnt make that sophisticated.
2.34x speed native @733MHz ? Hm... On the other hand, the JIT seems to be pretty good. Given that the GCC Version is now 50% faster than the Amiblitz Version, that would be approximately same speed as native. I asume you used the uppercase -O2 ?
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:29:57
| | [ #132 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
I dont know why -o2 didt throw me an error message?
|
It's a valid option, settings the output file to "2" then your later switch -o FFT... overode it. gcc allows this kind of things so that you can vary switches along the command line for different inputs._________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:34:11
| | [ #133 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| | Status: Offline |
| | bernd_afa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:35:22
| | [ #134 ] |
| |
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Wanderer
irs intresting to see how fast X86 native is, when no optimizer is use.it can good compare to GCC native because then on all CPU simular instructions are execute.
I compile the 68k C Version myself in amidev cpp.Its right only -O3 is ok.
Your C version get on my system
15.h0:wbstartup> "Ram Disk:FFTDemo/FFTDemo_68K_GCC" Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 3240ms for 4096000 samples, => 14.33x speed @44100Hz/mono (test=1.9523, should be ~1.9523) 15.h0:wbstartup>
my 68k C build with option -O3
15.h0:wbstartup> h1:amidevcpp/bernd/test/fftbench.exe Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 920ms for 4096000 samples, => 50.48x speed @44100Hz/mono (test=1.9523, should be ~1.9523) 15.h0:wbstartup>
and amiblitz is slower.
15.h0:wbstartup> "Ram Disk:FFTDemo_68K_AB3" Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float) Time needed 2114ms for 4096000 samples, => 21.97x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)
when look on GCC code for 68k your code is lots diffrent than formula.thats the speedcritical part.
for (j = 0; j < le; j++) { for (i = j; i < fft->npoints; i += le2) { tr = buffer[i+le].r * ur - buffer[i+le].i * ui; ti = buffer[i+le].r * ui + buffer[i+le].i * ur; buffer[i+le].r = buffer[i].r - tr; buffer[i+le].i = buffer[i].i - ti; buffer[i].r += tr; buffer[i].i += ti; }
maybe fft is a well known standard function and GCC optimizer use then a optimized snippet for this 125D52DA): FSMOVE.S 0(A0,D4.L*8),FP0 125D52E0): FSMOVE.X FP0,FP3 125D52E4): FSMUL.X FP4,FP3 125D52E8): FSMOVE.S 4(A0,D4.L*8),FP1 125D52EE): FSMOVE.X FP1,FP2 125D52F2): FSMUL.X FP5,FP2 125D52F6): FSSUB.X FP2,FP3 125D52FA): FSMUL.X FP5,FP0 125D52FE): FSMUL.X FP4,FP1 125D5302): FSADD.X FP1,FP0 125D5306): FSMOVE.S (A0),FP2 125D530A): FSSUB.X FP3,FP2 125D530E): FMOVE.S FP2,0(A0,D4.L*8) 125D5314): FSMOVE.S 4(A0),FP1 125D531A): FSSUB.X FP0,FP1 125D531E): FMOVE.S FP1,4(A0,D4.L*8) 125D5324): FSADD.S (A0),FP3 125D5328): FMOVE.S FP3,(A0) 125D532C): FSADD.S 4(A0),FP0 125D5332): FMOVE.S FP0,4(A0) 125D5338): ADD.L D5,D0 125D533A): ADDA.L D2,A0 125D533C): MOVE.L 4(A2),D1 125D5340): CMP.L D1,D0 125D5342): BLT.S __fft+$CA ;125D52DA
here is the value of X86 native version
C:\Users\pc>H:\test\FFTDemo_Win32.exe Speed test for FFT + iFFT: (C, x86/Win32, float) Time needed 296ms for 4096000 samples, => 156.89x speed @44100Hz/mono (test=1.95 23, should be ~1.9523)
X86 native is then only 3.2 * faster. intresting what asm code they use to get such a big speedup Last edited by bernd_afa on 06-Mar-2012 at 04:45 PM. Last edited by bernd_afa on 06-Mar-2012 at 04:35 PM.
|
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:40:00
| | [ #135 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @broadblues
wow this one is a bit supriseing I us -Os (optimise for size) instead and got a substantial speedup!
10.RAM Disk:FFTDemo/src> gcc -Os FFT.c FFTDemo.c -o ../FFTDemo_PPC_GCC_OS 10.RAM Disk:FFTDemo/src> / 10.RAM Disk:FFTDemo> FFTDemo_PPC_GCC_OS Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 5278ms for 4096000 samples, => 8.80x speed @44100Hz/mono (test=1.9523, should be ~1.9523) 10.RAM Disk:FFTDemo>
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | bernd_afa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 16:58:31
| | [ #136 ] |
| |
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @broadblues
can you offer for upload the PPC version+ source so other can test ? strange that it give such a big speedboost.maybe you can compile blender and the speed test with that setting and its faster as aone
Last edited by bernd_afa on 07-Mar-2012 at 08:36 AM. Last edited by bernd_afa on 06-Mar-2012 at 05:02 PM.
|
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 18:34:10
| | [ #137 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @bernd_afa Please dont make copies and provide them for download. This will cause quickly confusion about the benchmark. Better send the changes to me and I will update the archive. Your exe is btw. 34x speed, means it is slightly faster than GCC with 28x. If you think Amiblitz is behind because of the code, I can take a look at it. The C code is pretty "dense" which is good for the C optimizer. But I think I should get close with Amiblitz code.
BTW, how can I detect easily the system it is compiled for, so I can add it to the shell output?
"AMIGA" seems to be set on MOS too. Is there something like 68K, PPC or MC68020 etc. ?
If I want to get rid of ixemul, there is a compiler switch, right? But I wont be able to do shell output with printf, or?
For the Benchmark Code: I build the sum of various coefficients of the FFT each loop. I doubt that the optimizer is able to predict this. Also the input is re-written every loop, so the optimizer cannot detect a loop-invariant here. But I can make it even more difficult to predict by initializing the time buffer with white noise that is different every loop. The speedup with -Os on the PPC seems to be quite suspicious.
Last edited by Wanderer on 06-Mar-2012 at 06:40 PM. Last edited by Wanderer on 06-Mar-2012 at 06:37 PM. Last edited by Wanderer on 06-Mar-2012 at 06:36 PM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | wawa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 18:41:32
| | [ #138 ] |
| |
|
Elite Member |
Joined: 21-Jan-2008 Posts: 6259
From: Unknown | | |
|
| | Status: Offline |
| | Tuxedo
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 18:56:22
| | [ #139 ] |
| |
|
Elite Member |
Joined: 28-Nov-2003 Posts: 2341
From: Perugia, ITALY | | |
|
| @Wanderer
on my AmigaOS4.1Upd4 Peg2@1131MHz
I get:
8.RAM Disk:FFTDemo> FFTDemo_68K_AB3 Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float) Time needed 14007ms for 4096000 samples, => 3.32x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)
GCC version as broadblues sayd dont works... _________________ Simone"Tuxedo"Monsignori, Perugia, ITALY. |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 18:59:09
| | [ #140 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @wawa
noixemul doesnt let me do formatted output and timing anymore. I would need to write it via AmigaOS API.
@people please wait with benchmarking, I am doing some changes... _________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|