Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
|
|
|
|
Poster | Thread | itix
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 19:04:26
| | [ #141 ] |
| |
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @Wanderer
Quote:
"AMIGA" seems to be set on MOS too. Is there something like 68K, PPC or MC68020 etc. ?
|
There is "__MORPHOS__" defined.
Quote:
If I want to get rid of ixemul, there is a compiler switch, right? But I wont be able to do shell output with printf, or?
|
Use -noixemul switch and it will link against libnix instead of ixemul libraries. You still have your printf, fopen and others. Ixemul just provides BSDish runtime environment where *nix paths are supported and so on. It is only meant for GeedGadgets kind of stuff, it is similar to Cygwin.
_________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
| Status: Offline |
| | itix
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 19:12:00
| | [ #142 ] |
| |
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @Wanderer
Quote:
noixemul doesnt let me do formatted output and timing anymore. I would need to write it via AmigaOS API.
|
-noixemul works fine here in MorphOS when compiled for PowerPC. It must be your compiler _________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
| Status: Offline |
| | bernd_afa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 19:40:55
| | [ #143 ] |
| |
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Wanderer >Your exe is btw. 34x speed, means it is slightly faster than GCC with 28x. >If you think Amiblitz is behind because of the code, I can take a look at it. The C code is >pretty "dense" which is good for the C optimizer. But I think I should get close with Amiblitz >code.
it is a alot faster.noixemul work ok with timer output.I do this build.see the attached devcpp config file.here you can see compiler settings.when i switch off optimizer i get same speed as yours.
about the speed: Have you test a unoptimized Visual C build ?
The compiler optimize that the count value le need for all mem access *8.so i guess you need just modify the step and you can avoid alot of address calculation.
but best is, if you test the resulting code, if it really work.j in C source seem in asm output d4.because d4 is not change in the time critical loop.
but maybe the asm code is not correct, so a audio test is need, if it really work
for (j = 0; j < le; j++) { for (i = j; i < fft->npoints; i += le2) { tr = buffer[i+le].r * ur - buffer[i+le].i * ui; ti = buffer[i+le].r * ui + buffer[i+le].i * ur; buffer[i+le].r = buffer[i].r - tr; buffer[i+le].i = buffer[i].i - ti; buffer[i].r += tr; buffer[i].i += ti; }
here is more from the 68k asm source
126DA71C: LEA _cos,A4 ;126DAF58 126DA722: LEA _sin,A3 ;126DAF70 126DA728: FDMOVE.S D1,FP0 126DA72C: FMOVE.D FP0,-8(A5) 126DA732: MOVE.L D0,D6 126DA734: MOVE.L D5,D4 126DA736: ADD.L D5,D5 126DA738: FMOVE.S #+1,FP4 126DA740: FMOVE.S #+0.0,FP5 126DA748: FSMOVE.S D7,FP6 126DA74C: FMOVE.D FP6,-(A7) 126DA750: MOVE.L (A7)+,D2 126DA752: MOVE.L (A7)+,D3 126DA754: MOVE.L D3,-(A7) 126DA756: MOVE.L D2,-(A7) 126DA758: JSR (A4) 126DA75A: MOVE.L D1,-(A7) 126DA75C: MOVE.L D0,-(A7) 126DA75E: FDMOVE.D (A7)+,FP3 126DA762: FSMOVE.X FP3,FP7 126DA766: MOVE.L D2,(A7) 126DA768: MOVE.L D3,4(A7) 126DA76C: JSR (A3) 126DA76E: ADDQ.L #8,A7 126DA770: MOVE.L D1,-(A7) 126DA772: MOVE.L D0,-(A7) 126DA774: FDMOVE.D (A7)+,FP2 126DA778: FDMUL.D -8(A5),FP2 126DA77E: FSMOVE.X FP2,FP6 126DA782: FSMOVE.S D7,FP1 126DA786: FSMUL.S #+5E-1,FP1 126DA78E: FMOVE.S FP1,D7 126DA792: CLR.L D2 126DA794: CMP.L D2,D4 126DA796: BLE __fft+$156 ;126DA83E 126DA79A: MOVEA.L 4(A6),A1 126DA79E: MOVE.L D2,D0 126DA7A0: CMP.L A1,D2 126DA7A2: BGE.S __fft+$12A ;126DA812 126DA7A4: LEA 0(A2,D2.L*8),A0 126DA7A8: MOVE.L D5,D1 126DA7AA: LSL.L #3,D1 126DA7AC: FSMOVE.S 0(A0,D4.L*8),FP0 126DA7B2: FSMOVE.X FP0,FP3 126DA7B6: FSMUL.X FP4,FP3 126DA7BA: FSMOVE.S 4(A0,D4.L*8),FP1 126DA7C0: FSMOVE.X FP1,FP2 126DA7C4: FSMUL.X FP5,FP2 126DA7C8: FSSUB.X FP2,FP3 126DA7CC: FSMUL.X FP5,FP0 126DA7D0: FSMUL.X FP4,FP1 126DA7D4: FSADD.X FP1,FP0 126DA7D8: FSMOVE.S (A0),FP2 126DA7DC: FSSUB.X FP3,FP2 126DA7E0: FMOVE.S FP2,0(A0,D4.L*8) 126DA7E6: FSMOVE.S 4(A0),FP1 126DA7EC: FSSUB.X FP0,FP1 126DA7F0: FMOVE.S FP1,4(A0,D4.L*8) 126DA7F6: FSADD.S (A0),FP3 126DA7FA: FMOVE.S FP3,(A0) 126DA7FE: FSADD.S 4(A0),FP0 126DA804: FMOVE.S FP0,4(A0) 126DA80A: ADD.L D5,D0 126DA80C: ADDA.L D1,A0 126DA80E: CMP.L A1,D0 126DA810: BLT.S __fft+$C4 ;126DA7AC 126DA812: FSMOVE.X FP4,FP3 126DA816: FSMUL.X FP7,FP3 126DA81A: FSMOVE.X FP5,FP0 126DA81E: FSMUL.X FP6,FP0 126DA822: FSMUL.X FP6,FP4 126DA826: FSMUL.X FP7,FP5 126DA82A: FSADD.X FP4,FP5 126DA82E: FSMOVE.X FP3,FP4 126DA832: FSSUB.X FP0,FP4 126DA836: ADDQ.L #1,D2 126DA838: CMP.L D2,D4 126DA83A: BGT __fft+$B6 ;126DA79E
Last edited by bernd_afa on 06-Mar-2012 at 07:51 PM. Last edited by bernd_afa on 06-Mar-2012 at 07:50 PM. Last edited by bernd_afa on 06-Mar-2012 at 07:49 PM. Last edited by bernd_afa on 06-Mar-2012 at 07:43 PM.
|
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 20:29:07
| | [ #144 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @itix
Quote:
Use -noixemul switch and it will link against libnix instead of ixemul libraries.
|
But the libnix needs to be installed for that to work. Depending on where the gcc isntall came from that may not be the case.
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | Kicko
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 20:31:16
| | [ #145 ] |
| |
|
Elite Member |
Joined: 19-Jun-2004 Posts: 5009
From: Sweden | | |
|
| Heres benchmarks from my x1000:
FFTDemo_68K_AB3 Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float) Time needed 8632ms for 4096000 samples, => 5.38x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)
FFTDemo_68K_GCC Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 0ms for 4096000 samples, => Infx speed @44100Hz/mono (test=1.9523, should be ~1.9523)
fftbench.exe Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 4900ms for 4096000 samples, => 9.48x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
@Broadblues.
If you make an os4 native version i could give you my benchmarks on that.
Last edited by Kicko on 06-Mar-2012 at 08:32 PM.
|
| Status: Offline |
| | wawa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 20:33:06
| | [ #146 ] |
| |
|
Elite Member |
Joined: 21-Jan-2008 Posts: 6259
From: Unknown | | |
|
| @broadblues
libnix should be available under zerohero compilers that come with amidevcpp, same with bernds 4.x.x compilers. wanderer should have these. |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 20:33:45
| | [ #147 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
The speedup with -Os on the PPC seems to be quite suspicious.
|
perhaps the code is bordeline for hitting the cache? so with -O2 it misses or with -Os it fits in the 440 cache?
the x1000 ha much bigger cache so might be test of that theory....
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 21:25:35
| | [ #148 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
here are my new results:
Test Results on my Laptop (i5@2.5GHz):
WinUAE + AmiBlitz3 Speed test for FFT/iFFT: (AmiBlitz3, Optimize 7=68020+FPU, float) Time needed 2622ms for 4096000 samples, => 17.71x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)
WinUAE + GCC2.95 O3: Speed test for FFT/iFFT: (C, 68K/Amiga, float) Time needed 1520ms for 4096000 samples, => 30.55x speed @44100Hz/mono (checksum=-0.1178, should be ~ -0.1179)
Win32 +VS2008 Debug Speed test for FFT/iFFT: (C, x86/Win32, float) Time needed 746ms for 4096000 samples, => 62.25x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)
Win32 + VS2008 Release O2: Speed test for FFT/iFFT: (C, x86/Win32, float) Time needed 218ms for 4096000 samples, => 213.03x speed @44100Hz/mono (checksum=-0.1179, should be ~ -0.1179)
I am using GCC 2.95.3. When I add "-noixemul", the result "feels" a little faster, but I dont get formatted output. It outpus the text but doesnt replace the %f stuff. Maybe I should get a newer GCC... When doing Os or O3, It jumps from 28x to 30x.
I dont know if the code is cache critical. It does a 4096pt FFT. That means the buffer it operates on is 4096*2*sizeof(float) = 32768bytes. That should easily fit in the case, no? I could make a smaller FFT, like 1024pt.
@Bernd The FFT is fully checksummed, means it must be computed exactly. The code itself is correct, I use it in many applications under various OSes.
Last edited by Wanderer on 06-Mar-2012 at 09:28 PM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 21:51:16
| | [ #149 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
I am using GCC 2.95.3. When I add "-noixemul", the result "feels" a little faster, but I dont get formatted output. It outpus the text but doesnt replace the %f stuff. Maybe I should get a newer GCC...
|
or try adding the -lm compiler flag.
I did another test with -O1 and it ram at about 6x speed. Then with no optimisation and that came out the same as with -O2 something strange is going on. Somehow -O2 looks like it's not working. That seems unlikely.
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 22:00:35
| | [ #150 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @broadblues
I don't now what was happening with O2 before but now it's working as expected.
6.RAM Disk:FFTDemo/src> gcc -o ../FFTDemo_PPP_NOP FFT.c FFTDemo.c FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable 6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_NOP Speed test for FFT/iFFT: (C, PPC/Amiga, float) Time needed 19999ms for 4096000 samples, => 2.32x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)
6.RAM Disk:FFTDemo/src> gcc -O1 -o ../FFTDemo_PPP_O1 FFT.c FFTDemo.c FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable 6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_O1 Speed test for FFT/iFFT: (C, PPC/Amiga, float) Time needed 7277ms for 4096000 samples, => 6.38x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)
6.RAM Disk:FFTDemo/src> gcc -Os -o ../FFTDemo_PPP_OS FFT.c FFTDemo.c FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable 6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_OS Speed test for FFT/iFFT: (C, PPC/Amiga, float) Time needed 5495ms for 4096000 samples, => 8.45x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)
6.RAM Disk:FFTDemo/src> gcc -O2 -o ../FFTDemo_PPP_O2 FFT.c FFTDemo.c FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable 6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_O2 Speed test for FFT/iFFT: (C, PPC/Amiga, float) Time needed 5022ms for 4096000 samples, => 9.25x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179) 6.RAM Disk:FFTDemo/src>
Last edited by broadblues on 06-Mar-2012 at 10:01 PM.
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 6-Mar-2012 23:50:17
| | [ #151 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @broadblues
I have no idea what -lm does, but now the printf works again with -noixemul, thanks! The whole thing got a speedup:
gcc FFT.c FFTDemo.c -O3 -o ../FFTDemo_68K_GCC2.95.exe -m68040 -m68881 -Dm68k -lm -noixemul
Speed test for FFT/iFFT: (C, 68K/Amiga, float) Time needed 1300ms for 4096000 samples, => 35.72x speed @44100Hz/mono (checksum=-0.1178, should be ~ -0.1179)
I also have no idea how using ixemul or not using ixemul can influence the speed of the FFT. Anyway, that seems to be the fastest I can get with GCC2.95.3 for 68k. A little bit faster now than DevCPP.
Could you send me the PPC binary (I think its MOS, right?) Then I can add it to the download (not everybody is a developer and able to compile it himself)
Last edited by Wanderer on 07-Mar-2012 at 12:01 AM. Last edited by Wanderer on 06-Mar-2012 at 11:56 PM. Last edited by Wanderer on 06-Mar-2012 at 11:54 PM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | broadblues
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 7-Mar-2012 0:21:06
| | [ #152 ] |
| |
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @Wanderer
Quote:
I have no idea what -lm does, but now the printf works again with -noixemul, thanks!
|
Links in the math library. Which replaces printf with a float capable amongst other things. Some clibs have a seperate libm.a some don't
Quote:
Could you send me the PPC binary (I think its MOS, right?)
|
AmigaOS 4
Sent by emnail.
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
| Status: Offline |
| | itix
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 7-Mar-2012 6:23:25
| | [ #153 ] |
| |
|
Elite Member |
Joined: 22-Dec-2004 Posts: 3398
From: Freedom world | | |
|
| @Wanderer
Here are results from Mac mini running MorphOS 2.7:
Quote:
Ram Disk:FFTDemo/src> gg:bin/ppc-morphos-gcc-4.4.5 -noixemul -O3 FFTDemo.c FFT.c -o FFTDemo_MorphOS FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable Ram Disk:FFTDemo/src> FFTDemo_MorphOS Speed test for FFT/iFFT: (C, ???/Amiga, float) Time needed 1250ms for 4096000 samples, => 36.86x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)
|
Using -Os, -O2 or -O3 optimization flags didn't make any difference.
Your previous version was slightly faster but only by ~50 ms.
AmiBlitz: Quote:
Ram Disk:FFTDemo> FFTDemo_68K_AB3 Speed test for FFT/iFFT: (AmiBlitz3, Optimize 7=68020+FPU, float) Time needed 10024ms for 4096000 samples, => 4.63x speed @44100Hz/mono (checksum=-0.1046, should be ~ -0.1179)
|
68k Dev: Quote:
Ram Disk:FFTDemo> FFTDemo_68K_Dev.exe Speed test for FFT + iFFT: (C, 68040+FPU, float) Time needed 6820ms for 4096000 samples, => 6.81x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
|
68k GCC: Quote:
Ram Disk:FFTDemo> FFTDemo_68K_GCC2.95.exe Speed test for FFT/iFFT: (C, 68K/Amiga, float) Time needed 6540ms for 4096000 samples, => 7.10x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)
|
Interestingly if I give better priority to Trance JIT then 68k dev result improves to 8.0x speed but other results are still the same.
Last edited by itix on 07-Mar-2012 at 06:24 AM.
_________________ Amiga Developer Amiga 500, Efika, Mac Mini and PowerBook |
| Status: Offline |
| | Kicko
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 7-Mar-2012 6:54:12
| | [ #154 ] |
| |
|
Elite Member |
Joined: 19-Jun-2004 Posts: 5009
From: Sweden | | |
|
| A question on my video i am making about hd-rec. Is 40minutes too long or should i try to keep it shorter ? |
| Status: Offline |
| | bernd_afa
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 7-Mar-2012 9:24:11
| | [ #155 ] |
| |
|
Cult Member |
Joined: 14-Apr-2006 Posts: 829
From: Unknown | | |
|
| @Wanderer >The FFT is fully checksummed, means it must be computed exactly. >The code itself is correct, I use it in many applications under various OSes.
ok, then your code is right, but how can X86 native be so fast.The Mac Mini with 1.42 GHZ do only this.
""" Ram Disk:FFTDemo/src> gg:bin/ppc-morphos-gcc-4.4.5 -noixemul -O3 FFTDemo.c FFT.c -o FFTDemo_MorphOS FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable Ram Disk:FFTDemo/src> FFTDemo_MorphOS Speed test for FFT/iFFT: (C, ???/Amiga, float) Time needed 1250ms for 4096000 samples, => 36.86x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179) """""
your value
Time needed 218ms for 4096000 samples, => 213.03x speed @44100Hz/mono (checksum=-0.1179, should be ~ -0.1179)
this mean your 2.5 GHZ notebook CPU which is 1.76* higher clocked is 5.77* faster. and if you downclock your CPU to 1.42 GHZ it is still 3.2* faster.
this mean performance /MHZ is on your X86 3.2 * better in this bench.
Can you compile this bench for 64 bit too, here the FPU is diffrent adress because it have real register.maybe it is faster too.
Do you not know how to see VC asm output, then show on that link if that work on your release build.
http://social.msdn.microsoft.com/Forums/en/vcgeneral/thread/c53fd4fd-e239-464a-b512-2b2fc8745c88
maybe if we find the "trick" we can speedup winuae more . Last edited by bernd_afa on 07-Mar-2012 at 09:24 AM.
|
| Status: Offline |
| | realize
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 7-Mar-2012 9:59:21
| | [ #156 ] |
| |
|
Super Member |
Joined: 14-Apr-2003 Posts: 1797
From: nyc | | |
|
| @Kicko
Cant wait for the video. 40mins is cool if its got action, but if there is like 2-3 min pauses then no :) But we are grateful for anything thanks. |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 8-Mar-2012 11:34:05
| | [ #157 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @Bernd
Here is the optimized disassembly:
Quote:
252: void _fft(fftH *fft, fftCF *buffer, int inverse) { 00401390 sub esp,0Ch 253: float ur,ui,wr,wi,tr,ti; 254: int k,j,i; 255: 256: int le = 0; 257: int le2 = 1; 258: float w = ((float)M_PI); 259: float df=1.0f; 260: 261: if (inverse) df = -1.0f; 00401393 cmp dword ptr [esp+18h],0 00401398 fld dword ptr [__real@40490fdb (40310Ch)] 0040139E fld1 004013A0 push ebx 004013A1 fstp dword ptr [esp+4] 004013A5 push ebp 004013A6 mov ebx,1 004013AB je _fft+27h (4013B7h) 004013AD fld dword ptr [__real@bf800000 (403108h)] 004013B3 fstp dword ptr [esp+8] 262: 263: for (k = 0; k < fft->order; k++) { 004013B7 mov ebp,dword ptr [esp+18h] 004013BB cmp dword ptr [ebp],0 004013BF mov dword ptr [esp+0Ch],0 004013C7 jle _fft+121h (4014B1h) 004013CD mov edx,dword ptr [esp+1Ch] 004013D1 push esi 004013D2 push edi 264: le = le2; 265: le2 <<= 1; 266: ur = 1.0f; 004013D3 fld1 004013D5 mov ecx,ebx 267: ui = 0.0f; 004013D7 fldz 268: 269: wr = cosf(w); 270: wi = sinf(w) * df; 271: w /= 2.0f; 272: 273: for (j = 0; j < le; j++) { 004013D9 xor eax,eax 004013DB fld st(2) 004013DD add ebx,ebx 004013DF fcos 004013E1 mov dword ptr [esp+18h],ecx 004013E5 mov dword ptr [esp+20h],eax 004013E9 fld st(3) 004013EB fsin 004013ED fmul dword ptr [esp+10h] 004013F1 fxch st(4) 004013F3 fmul dword ptr [__real@3f000000 (403104h)] 004013F9 test ecx,ecx 004013FB jle _fft+129h (4014B9h) 00401401 mov esi,dword ptr [fft] 00401404 lea edi,[edx+ecx*8] 00401407 mov dword ptr [esp+28h],edi 0040140B jmp _fft+83h (401413h) 0040140D fxch st(2) 0040140F fxch st(3) 00401411 fxch st(2) 258: float w = ((float)M_PI); 259: float df=1.0f; 260: 261: if (inverse) df = -1.0f; 262: 263: for (k = 0; k < fft->order; k++) { 264: le = le2; 265: le2 <<= 1; 266: ur = 1.0f; 267: ui = 0.0f; 268: 269: wr = cosf(w); 270: wi = sinf(w) * df; 271: w /= 2.0f; 272: 273: for (j = 0; j < le; j++) { 274: for (i = j; i < fft->npoints; i += le2) { 00401413 cmp eax,esi 00401415 jge _fft+0DDh (40146Dh) 00401417 mov ecx,dword ptr [esp+28h] 0040141B lea edi,[ebx*8] 275: tr = buffer[i+le].r * ur - buffer[i+le].i * ui; 00401422 fld dword ptr [ecx] 00401424 fmul st,st(4) 00401426 fld dword ptr [ecx+4] 00401429 fmul st,st(4) 0040142B fsubp st(1),st 276: ti = buffer[i+le].r * ui + buffer[i+le].i * ur; 0040142D fld dword ptr [ecx] 0040142F fmul st,st(4) 00401431 fld dword ptr [ecx+4] 00401434 fmul st,st(6) 00401436 faddp st(1),st 277: buffer[i+le].r = buffer[i].r - tr; 00401438 fld dword ptr [edx+eax*8] 0040143B fsub st,st(2) 0040143D fstp dword ptr [ecx] 278: buffer[i+le].i = buffer[i].i - ti; 0040143F fld dword ptr [edx+eax*8+4] 00401443 fsub st,st(1) 00401445 fstp dword ptr [ecx+4] 00401448 add ecx,edi 279: buffer[i].r += tr; 0040144A fld dword ptr [edx+eax*8] 0040144D faddp st(2),st 0040144F fxch st(1) 00401451 fstp dword ptr [edx+eax*8] 280: buffer[i].i += ti; 00401454 fadd dword ptr [edx+eax*8+4] 00401458 fstp dword ptr [edx+eax*8+4] 0040145C mov esi,dword ptr [fft] 0040145F add eax,ebx 00401461 cmp eax,esi 00401463 jl _fft+92h (401422h) 00401465 mov eax,dword ptr [esp+20h] 00401469 mov ecx,dword ptr [esp+18h] 0040146D add dword ptr [esp+28h],8 281: } 282: tr = ur*wr - ui*wi; 00401472 fld st(1) 00401474 fmul st,st(4) 00401476 inc eax 00401477 cmp eax,ecx 00401479 fld st(5) 0040147B fmul st,st(4) 0040147D mov dword ptr [esp+20h],eax 00401481 fsubp st(1),st 283: ui = ur*wi + ui*wr; 00401483 fld st(2) 00401485 fmulp st(4),st 00401487 fld st(5) 00401489 fmulp st(5),st 0040148B fxch st(3) 0040148D faddp st(4),st 0040148F jl _fft+7Dh (40140Dh) 00401495 fstp st(3) 00401497 fstp st(1) 00401499 fstp st(0) 0040149B mov eax,dword ptr [esp+14h] 0040149F fstp st(1) 004014A1 inc eax 004014A2 cmp eax,dword ptr [ebp] 004014A5 mov dword ptr [esp+14h],eax 004014A9 jl _fft+43h (4013D3h) 004014AF pop edi 004014B0 pop esi 004014B1 pop ebp 004014B2 fstp st(0) 004014B4 pop ebx 284: ur = tr; 285: } 286: } 287: } 004014B5 add esp,0Ch 004014B8 ret 268: 269: wr = cosf(w); 270: wi = sinf(w) * df; 271: w /= 2.0f; 272: 273: for (j = 0; j < le; j++) { 004014B9 fstp st(2) 004014BB fstp st(2) 004014BD fstp st(1) 004014BF jmp _fft+10Bh (40149Bh)
|
Last edited by Wanderer on 08-Mar-2012 at 11:35 AM.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 8-Mar-2012 11:36:56
| | [ #158 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @Kicko
This is too long. The problem with videos is always to keep them short.
I would try to sqeeze it into 10mins, or make several videos a 10mins each.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 8-Mar-2012 11:41:23
| | [ #159 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @Bernd
> maybe if we find the "trick" we can speedup winuae more . The other Bernd (JIT Author) told some years ago, that the JIT output is based on x486 or even older instruction set (dont remember, could be x386), and thus, there is plenty of possibilities to make it faster on modern machines if you make a compatibelity cut. E.g. different re-ordering to support longer pipelines, use SIMD instructions etc.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| | Wanderer
| |
Re: HD-REC on OS4.x on X1000 ? Posted on 8-Mar-2012 11:44:16
| | [ #160 ] |
| |
|
Cult Member |
Joined: 16-Aug-2008 Posts: 654
From: Germany | | |
|
| @itix
Interesting. I just wonder why the checksum of the AB3 version is so inaccurate. Probalby the FPU is running in a different mode.
_________________ -- Author of HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more... Homepage: http://www.hd-rec.de |
| Status: Offline |
| |
|
|
|
[ home ][ about us ][ privacy ]
[ forums ][ classifieds ]
[ links ][ news archive ]
[ link to us ][ user account ]
|