Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
30 crawler(s) on-line.
 91 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 OlafS25:  15 mins ago
 bhabbott:  17 mins ago
 OneTimer1:  24 mins ago
 Rob:  39 mins ago
 matthey:  1 hr 2 mins ago
 billt:  1 hr 4 mins ago
 Lou:  1 hr 23 mins ago
 amigakit:  2 hrs 2 mins ago
 NutsAboutAmiga:  2 hrs 36 mins ago
 AMIGASYSTEM:  2 hrs 40 mins ago

/  Forum Index
   /  Amiga OS4 Software
      /  HD-REC on OS4.x on X1000 ?
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )
PosterThread
itix 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 19:04:26
#141 ]
Elite Member
Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@Wanderer

Quote:

"AMIGA" seems to be set on MOS too. Is there something like 68K, PPC or MC68020 etc. ?


There is "__MORPHOS__" defined.

Quote:

If I want to get rid of ixemul, there is a compiler switch, right? But I wont be able to do shell output with printf, or?


Use -noixemul switch and it will link against libnix instead of ixemul libraries. You still have your printf, fopen and others. Ixemul just provides BSDish runtime environment where *nix paths are supported and so on. It is only meant for GeedGadgets kind of stuff, it is similar to Cygwin.

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

 Status: Offline
Profile     Report this post  
itix 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 19:12:00
#142 ]
Elite Member
Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@Wanderer

Quote:

noixemul doesnt let me do formatted output and timing anymore. I would need to write it via AmigaOS API.


-noixemul works fine here in MorphOS when compiled for PowerPC. It must be your compiler

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

 Status: Offline
Profile     Report this post  
bernd_afa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 19:40:55
#143 ]
Cult Member
Joined: 14-Apr-2006
Posts: 829
From: Unknown

@Wanderer
>Your exe is btw. 34x speed, means it is slightly faster than GCC with 28x.
>If you think Amiblitz is behind because of the code, I can take a look at it. The C code is >pretty "dense" which is good for the C optimizer. But I think I should get close with Amiblitz >code.

it is a alot faster.noixemul work ok with timer output.I do this build.see the attached devcpp config file.here you can see compiler settings.when i switch off optimizer i get same speed as yours.

about the speed:
Have you test a unoptimized Visual C build ?

The compiler optimize that the count value le need for all mem access *8.so i guess you need just modify the step and you can avoid alot of address calculation.

but best is, if you test the resulting code, if it really work.j in C source seem in asm output d4.because d4 is not change in the time critical loop.

but maybe the asm code is not correct, so a audio test is need, if it really work

for (j = 0; j < le; j++) {
for (i = j; i < fft->npoints; i += le2) {
tr = buffer[i+le].r * ur - buffer[i+le].i * ui;
ti = buffer[i+le].r * ui + buffer[i+le].i * ur;
buffer[i+le].r = buffer[i].r - tr;
buffer[i+le].i = buffer[i].i - ti;
buffer[i].r += tr;
buffer[i].i += ti;
}

here is more from the 68k asm source


126DA71C: LEA _cos,A4 ;126DAF58
126DA722: LEA _sin,A3 ;126DAF70
126DA728: FDMOVE.S D1,FP0
126DA72C: FMOVE.D FP0,-8(A5)
126DA732: MOVE.L D0,D6
126DA734: MOVE.L D5,D4
126DA736: ADD.L D5,D5
126DA738: FMOVE.S #+1,FP4
126DA740: FMOVE.S #+0.0,FP5
126DA748: FSMOVE.S D7,FP6
126DA74C: FMOVE.D FP6,-(A7)
126DA750: MOVE.L (A7)+,D2
126DA752: MOVE.L (A7)+,D3
126DA754: MOVE.L D3,-(A7)
126DA756: MOVE.L D2,-(A7)
126DA758: JSR (A4)
126DA75A: MOVE.L D1,-(A7)
126DA75C: MOVE.L D0,-(A7)
126DA75E: FDMOVE.D (A7)+,FP3
126DA762: FSMOVE.X FP3,FP7
126DA766: MOVE.L D2,(A7)
126DA768: MOVE.L D3,4(A7)
126DA76C: JSR (A3)
126DA76E: ADDQ.L #8,A7
126DA770: MOVE.L D1,-(A7)
126DA772: MOVE.L D0,-(A7)
126DA774: FDMOVE.D (A7)+,FP2
126DA778: FDMUL.D -8(A5),FP2
126DA77E: FSMOVE.X FP2,FP6
126DA782: FSMOVE.S D7,FP1
126DA786: FSMUL.S #+5E-1,FP1
126DA78E: FMOVE.S FP1,D7
126DA792: CLR.L D2
126DA794: CMP.L D2,D4
126DA796: BLE __fft+$156 ;126DA83E
126DA79A: MOVEA.L 4(A6),A1
126DA79E: MOVE.L D2,D0
126DA7A0: CMP.L A1,D2
126DA7A2: BGE.S __fft+$12A ;126DA812
126DA7A4: LEA 0(A2,D2.L*8),A0
126DA7A8: MOVE.L D5,D1
126DA7AA: LSL.L #3,D1
126DA7AC: FSMOVE.S 0(A0,D4.L*8),FP0
126DA7B2: FSMOVE.X FP0,FP3
126DA7B6: FSMUL.X FP4,FP3
126DA7BA: FSMOVE.S 4(A0,D4.L*8),FP1
126DA7C0: FSMOVE.X FP1,FP2
126DA7C4: FSMUL.X FP5,FP2
126DA7C8: FSSUB.X FP2,FP3
126DA7CC: FSMUL.X FP5,FP0
126DA7D0: FSMUL.X FP4,FP1
126DA7D4: FSADD.X FP1,FP0
126DA7D8: FSMOVE.S (A0),FP2
126DA7DC: FSSUB.X FP3,FP2
126DA7E0: FMOVE.S FP2,0(A0,D4.L*8)
126DA7E6: FSMOVE.S 4(A0),FP1
126DA7EC: FSSUB.X FP0,FP1
126DA7F0: FMOVE.S FP1,4(A0,D4.L*8)
126DA7F6: FSADD.S (A0),FP3
126DA7FA: FMOVE.S FP3,(A0)
126DA7FE: FSADD.S 4(A0),FP0
126DA804: FMOVE.S FP0,4(A0)
126DA80A: ADD.L D5,D0
126DA80C: ADDA.L D1,A0
126DA80E: CMP.L A1,D0
126DA810: BLT.S __fft+$C4 ;126DA7AC
126DA812: FSMOVE.X FP4,FP3
126DA816: FSMUL.X FP7,FP3
126DA81A: FSMOVE.X FP5,FP0
126DA81E: FSMUL.X FP6,FP0
126DA822: FSMUL.X FP6,FP4
126DA826: FSMUL.X FP7,FP5
126DA82A: FSADD.X FP4,FP5
126DA82E: FSMOVE.X FP3,FP4
126DA832: FSSUB.X FP0,FP4
126DA836: ADDQ.L #1,D2
126DA838: CMP.L D2,D4
126DA83A: BGT __fft+$B6 ;126DA79E

Last edited by bernd_afa on 06-Mar-2012 at 07:51 PM.
Last edited by bernd_afa on 06-Mar-2012 at 07:50 PM.
Last edited by bernd_afa on 06-Mar-2012 at 07:49 PM.
Last edited by bernd_afa on 06-Mar-2012 at 07:43 PM.

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 20:29:07
#144 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@itix

Quote:

Use -noixemul switch and it will link against libnix instead of ixemul libraries.


But the libnix needs to be installed for that to work. Depending on where the gcc isntall came from that may not be the case.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Kicko 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 20:31:16
#145 ]
Elite Member
Joined: 19-Jun-2004
Posts: 5009
From: Sweden

Heres benchmarks from my x1000:

FFTDemo_68K_AB3
Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float)
Time needed 8632ms for 4096000 samples, => 5.38x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)

FFTDemo_68K_GCC
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 0ms for 4096000 samples, => Infx speed @44100Hz/mono (test=1.9523, should be ~1.9523)

fftbench.exe
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 4900ms for 4096000 samples, => 9.48x speed @44100Hz/mono (test=1.9523, should be ~1.9523)


@Broadblues.

If you make an os4 native version i could give you my benchmarks on that.

Last edited by Kicko on 06-Mar-2012 at 08:32 PM.

 Status: Offline
Profile     Report this post  
wawa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 20:33:06
#146 ]
Elite Member
Joined: 21-Jan-2008
Posts: 6259
From: Unknown

@broadblues

libnix should be available under zerohero compilers that come with amidevcpp, same with bernds 4.x.x compilers. wanderer should have these.

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 20:33:45
#147 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:


The speedup with -Os on the PPC seems to be quite suspicious.


perhaps the code is bordeline for hitting the cache? so with -O2 it misses or with -Os it fits in the 440 cache?

the x1000 ha much bigger cache so might be test of that theory....

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 21:25:35
#148 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

here are my new results:

Test Results on my Laptop (i5@2.5GHz):

WinUAE + AmiBlitz3
Speed test for FFT/iFFT: (AmiBlitz3, Optimize 7=68020+FPU, float)
Time needed 2622ms for 4096000 samples, => 17.71x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)

WinUAE + GCC2.95 O3:
Speed test for FFT/iFFT: (C, 68K/Amiga, float)
Time needed 1520ms for 4096000 samples, => 30.55x speed @44100Hz/mono (checksum=-0.1178, should be ~ -0.1179)

Win32 +VS2008 Debug
Speed test for FFT/iFFT: (C, x86/Win32, float)
Time needed 746ms for 4096000 samples, => 62.25x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)

Win32 + VS2008 Release O2:
Speed test for FFT/iFFT: (C, x86/Win32, float)
Time needed 218ms for 4096000 samples, => 213.03x speed @44100Hz/mono (checksum=-0.1179, should be ~ -0.1179)


I am using GCC 2.95.3. When I add "-noixemul", the result "feels" a little faster, but I dont get formatted output. It outpus the text but doesnt replace the %f stuff.
Maybe I should get a newer GCC...
When doing Os or O3, It jumps from 28x to 30x.

I dont know if the code is cache critical. It does a 4096pt FFT. That means the buffer it operates on is 4096*2*sizeof(float) = 32768bytes. That should easily fit in the case, no? I could make a smaller FFT, like 1024pt.

@Bernd
The FFT is fully checksummed, means it must be computed exactly.
The code itself is correct, I use it in many applications under various OSes.


Last edited by Wanderer on 06-Mar-2012 at 09:28 PM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 21:51:16
#149 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:

I am using GCC 2.95.3. When I add "-noixemul", the result "feels" a little faster, but I dont get formatted output. It outpus the text but doesnt replace the %f stuff.
Maybe I should get a newer GCC...


or try adding the -lm compiler flag.

I did another test with -O1 and it ram at about 6x speed. Then with no optimisation and that came out the same as with -O2 something strange is going on. Somehow -O2 looks like it's not working. That seems unlikely.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 22:00:35
#150 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@broadblues

I don't now what was happening with O2 before but now it's working as expected.

6.RAM Disk:FFTDemo/src> gcc -o ../FFTDemo_PPP_NOP FFT.c FFTDemo.c
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_NOP
Speed test for FFT/iFFT: (C, PPC/Amiga, float)
Time needed 19999ms for 4096000 samples, => 2.32x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)

6.RAM Disk:FFTDemo/src> gcc -O1 -o ../FFTDemo_PPP_O1 FFT.c FFTDemo.c
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_O1
Speed test for FFT/iFFT: (C, PPC/Amiga, float)
Time needed 7277ms for 4096000 samples, => 6.38x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)

6.RAM Disk:FFTDemo/src> gcc -Os -o ../FFTDemo_PPP_OS FFT.c FFTDemo.c
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_OS
Speed test for FFT/iFFT: (C, PPC/Amiga, float)
Time needed 5495ms for 4096000 samples, => 8.45x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)

6.RAM Disk:FFTDemo/src> gcc -O2 -o ../FFTDemo_PPP_O2 FFT.c FFTDemo.c
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
6.RAM Disk:FFTDemo/src> /FFTDemo_PPP_O2
Speed test for FFT/iFFT: (C, PPC/Amiga, float)
Time needed 5022ms for 4096000 samples, => 9.25x speed @44100Hz/mono (checksum=-0.1175, should be ~ -0.1179)
6.RAM Disk:FFTDemo/src>

Last edited by broadblues on 06-Mar-2012 at 10:01 PM.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 23:50:17
#151 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

I have no idea what -lm does, but now the printf works again with -noixemul, thanks!
The whole thing got a speedup:

gcc FFT.c FFTDemo.c -O3 -o ../FFTDemo_68K_GCC2.95.exe -m68040 -m68881 -Dm68k -lm -noixemul

Speed test for FFT/iFFT: (C, 68K/Amiga, float)
Time needed 1300ms for 4096000 samples, => 35.72x speed @44100Hz/mono (checksum=-0.1178, should be ~ -0.1179)

I also have no idea how using ixemul or not using ixemul can influence the speed of the FFT. Anyway, that seems to be the fastest I can get with GCC2.95.3 for 68k. A little bit faster now than DevCPP.

Could you send me the PPC binary (I think its MOS, right?)
Then I can add it to the download (not everybody is a developer and able to compile it himself)

Last edited by Wanderer on 07-Mar-2012 at 12:01 AM.
Last edited by Wanderer on 06-Mar-2012 at 11:56 PM.
Last edited by Wanderer on 06-Mar-2012 at 11:54 PM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 7-Mar-2012 0:21:06
#152 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:

I have no idea what -lm does, but now the printf works again with -noixemul, thanks!


Links in the math library. Which replaces printf with a float capable amongst other things. Some clibs have a seperate libm.a some don't

Quote:

Could you send me the PPC binary (I think its MOS, right?)


AmigaOS 4

Sent by emnail.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
itix 
Re: HD-REC on OS4.x on X1000 ?
Posted on 7-Mar-2012 6:23:25
#153 ]
Elite Member
Joined: 22-Dec-2004
Posts: 3398
From: Freedom world

@Wanderer

Here are results from Mac mini running MorphOS 2.7:

Quote:

Ram Disk:FFTDemo/src> gg:bin/ppc-morphos-gcc-4.4.5 -noixemul -O3 FFTDemo.c FFT.c -o FFTDemo_MorphOS
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
Ram Disk:FFTDemo/src> FFTDemo_MorphOS
Speed test for FFT/iFFT: (C, ???/Amiga, float)
Time needed 1250ms for 4096000 samples, => 36.86x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)


Using -Os, -O2 or -O3 optimization flags didn't make any difference.

Your previous version was slightly faster but only by ~50 ms.

AmiBlitz:
Quote:

Ram Disk:FFTDemo> FFTDemo_68K_AB3
Speed test for FFT/iFFT: (AmiBlitz3, Optimize 7=68020+FPU, float)
Time needed 10024ms for 4096000 samples, => 4.63x speed @44100Hz/mono (checksum=-0.1046, should be ~ -0.1179)


68k Dev:
Quote:

Ram Disk:FFTDemo> FFTDemo_68K_Dev.exe
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 6820ms for 4096000 samples, => 6.81x speed @44100Hz/mono (test=1.9523, should be ~1.9523)


68k GCC:
Quote:

Ram Disk:FFTDemo> FFTDemo_68K_GCC2.95.exe
Speed test for FFT/iFFT: (C, 68K/Amiga, float)
Time needed 6540ms for 4096000 samples, => 7.10x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)


Interestingly if I give better priority to Trance JIT then 68k dev result improves to 8.0x speed but other results are still the same.

Last edited by itix on 07-Mar-2012 at 06:24 AM.

_________________
Amiga Developer
Amiga 500, Efika, Mac Mini and PowerBook

 Status: Offline
Profile     Report this post  
Kicko 
Re: HD-REC on OS4.x on X1000 ?
Posted on 7-Mar-2012 6:54:12
#154 ]
Elite Member
Joined: 19-Jun-2004
Posts: 5009
From: Sweden

A question on my video i am making about hd-rec. Is 40minutes too long or should i try to keep it shorter ?

 Status: Offline
Profile     Report this post  
bernd_afa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 7-Mar-2012 9:24:11
#155 ]
Cult Member
Joined: 14-Apr-2006
Posts: 829
From: Unknown

@Wanderer
>The FFT is fully checksummed, means it must be computed exactly.
>The code itself is correct, I use it in many applications under various OSes.

ok, then your code is right, but how can X86 native be so fast.The Mac Mini with 1.42 GHZ do only this.

"""
Ram Disk:FFTDemo/src> gg:bin/ppc-morphos-gcc-4.4.5 -noixemul -O3 FFTDemo.c FFT.c -o FFTDemo_MorphOS
FFTDemo.c:36:43: warning: trigraph ??/ ignored, use -trigraphs to enable
Ram Disk:FFTDemo/src> FFTDemo_MorphOS
Speed test for FFT/iFFT: (C, ???/Amiga, float)
Time needed 1250ms for 4096000 samples, => 36.86x speed @44100Hz/mono (checksum=-0.1176, should be ~ -0.1179)
"""""

your value

Time needed 218ms for 4096000 samples, => 213.03x speed @44100Hz/mono (checksum=-0.1179, should be ~ -0.1179)

this mean your 2.5 GHZ notebook CPU which is 1.76* higher clocked is 5.77* faster.
and if you downclock your CPU to 1.42 GHZ it is still 3.2* faster.

this mean performance /MHZ is on your X86 3.2 * better in this bench.

Can you compile this bench for 64 bit too, here the FPU is diffrent adress because it have real register.maybe it is faster too.

Do you not know how to see VC asm output, then show on that link if that work on your release build.

http://social.msdn.microsoft.com/Forums/en/vcgeneral/thread/c53fd4fd-e239-464a-b512-2b2fc8745c88

maybe if we find the "trick" we can speedup winuae more .

Last edited by bernd_afa on 07-Mar-2012 at 09:24 AM.

 Status: Offline
Profile     Report this post  
realize 
Re: HD-REC on OS4.x on X1000 ?
Posted on 7-Mar-2012 9:59:21
#156 ]
Super Member
Joined: 14-Apr-2003
Posts: 1797
From: nyc

@Kicko

Cant wait for the video. 40mins is cool if its got action, but if there is like 2-3 min pauses then no :) But we are grateful for anything thanks.

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 8-Mar-2012 11:34:05
#157 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@Bernd

Here is the optimized disassembly:

Quote:

   252: void _fft(fftH *fft, fftCF *buffer, int inverse) {
00401390  sub         esp,0Ch
   253:   float ur,ui,wr,wi,tr,ti;
   254:   int k,j,i;
   255:
   256:   int le  = 0;
   257:   int le2 = 1;
   258:   float w = ((float)M_PI);
   259:   float df=1.0f;
   260:
   261:   if (inverse) df = -1.0f;
00401393  cmp         dword ptr [esp+18h],0
00401398  fld         dword ptr [__real@40490fdb (40310Ch)]
0040139E  fld1            
004013A0  push        ebx  
004013A1  fstp        dword ptr [esp+4]
004013A5  push        ebp  
004013A6  mov         ebx,1
004013AB  je          _fft+27h (4013B7h)
004013AD  fld         dword ptr [__real@bf800000 (403108h)]
004013B3  fstp        dword ptr [esp+8]
   262:
   263:   for (k = 0; k < fft->order; k++) {
004013B7  mov         ebp,dword ptr [esp+18h]
004013BB  cmp         dword ptr [ebp],0
004013BF  mov         dword ptr [esp+0Ch],0
004013C7  jle         _fft+121h (4014B1h)
004013CD  mov         edx,dword ptr [esp+1Ch]
004013D1  push        esi  
004013D2  push        edi  
   264:     le = le2;
   265:     le2 <<= 1;
   266:     ur = 1.0f;
004013D3  fld1            
004013D5  mov         ecx,ebx
   267:     ui = 0.0f;
004013D7  fldz            
   268:
   269:     wr = cosf(w);
   270:     wi = sinf(w) * df;
   271:     w /= 2.0f;
   272:
   273:     for (j = 0; j < le; j++) {
004013D9  xor         eax,eax
004013DB  fld         st(2)
004013DD  add         ebx,ebx
004013DF  fcos            
004013E1  mov         dword ptr [esp+18h],ecx
004013E5  mov         dword ptr [esp+20h],eax
004013E9  fld         st(3)
004013EB  fsin            
004013ED  fmul        dword ptr [esp+10h]
004013F1  fxch        st(4)
004013F3  fmul        dword ptr [__real@3f000000 (403104h)]
004013F9  test        ecx,ecx
004013FB  jle         _fft+129h (4014B9h)
00401401  mov         esi,dword ptr [fft]
00401404  lea         edi,[edx+ecx*8]
00401407  mov         dword ptr [esp+28h],edi
0040140B  jmp         _fft+83h (401413h)
0040140D  fxch        st(2)
0040140F  fxch        st(3)
00401411  fxch        st(2)
   258:   float w = ((float)M_PI);
   259:   float df=1.0f;
   260:
   261:   if (inverse) df = -1.0f;
   262:
   263:   for (k = 0; k < fft->order; k++) {
   264:     le = le2;
   265:     le2 <<= 1;
   266:     ur = 1.0f;
   267:     ui = 0.0f;
   268:
   269:     wr = cosf(w);
   270:     wi = sinf(w) * df;
   271:     w /= 2.0f;
   272:
   273:     for (j = 0; j < le; j++) {
   274:       for (i = j; i < fft->npoints; i += le2) {
00401413  cmp         eax,esi
00401415  jge         _fft+0DDh (40146Dh)
00401417  mov         ecx,dword ptr [esp+28h]
0040141B  lea         edi,[ebx*8]
   275:         tr = buffer[i+le].r * ur - buffer[i+le].i * ui;
00401422  fld         dword ptr [ecx]
00401424  fmul        st,st(4)
00401426  fld         dword ptr [ecx+4]
00401429  fmul        st,st(4)
0040142B  fsubp       st(1),st
   276:         ti = buffer[i+le].r * ui + buffer[i+le].i * ur;
0040142D  fld         dword ptr [ecx]
0040142F  fmul        st,st(4)
00401431  fld         dword ptr [ecx+4]
00401434  fmul        st,st(6)
00401436  faddp       st(1),st
   277:         buffer[i+le].r = buffer[i].r - tr;
00401438  fld         dword ptr [edx+eax*8]
0040143B  fsub        st,st(2)
0040143D  fstp        dword ptr [ecx]
   278:         buffer[i+le].i = buffer[i].i - ti;
0040143F  fld         dword ptr [edx+eax*8+4]
00401443  fsub        st,st(1)
00401445  fstp        dword ptr [ecx+4]
00401448  add         ecx,edi
   279:         buffer[i].r += tr;
0040144A  fld         dword ptr [edx+eax*8]
0040144D  faddp       st(2),st
0040144F  fxch        st(1)
00401451  fstp        dword ptr [edx+eax*8]
   280:         buffer[i].i += ti;
00401454  fadd        dword ptr [edx+eax*8+4]
00401458  fstp        dword ptr [edx+eax*8+4]
0040145C  mov         esi,dword ptr [fft]
0040145F  add         eax,ebx
00401461  cmp         eax,esi
00401463  jl          _fft+92h (401422h)
00401465  mov         eax,dword ptr [esp+20h]
00401469  mov         ecx,dword ptr [esp+18h]
0040146D  add         dword ptr [esp+28h],8
   281:       }
   282:       tr = ur*wr - ui*wi;
00401472  fld         st(1)
00401474  fmul        st,st(4)
00401476  inc         eax  
00401477  cmp         eax,ecx
00401479  fld         st(5)
0040147B  fmul        st,st(4)
0040147D  mov         dword ptr [esp+20h],eax
00401481  fsubp       st(1),st
   283:       ui = ur*wi + ui*wr;
00401483  fld         st(2)
00401485  fmulp       st(4),st
00401487  fld         st(5)
00401489  fmulp       st(5),st
0040148B  fxch        st(3)
0040148D  faddp       st(4),st
0040148F  jl          _fft+7Dh (40140Dh)
00401495  fstp        st(3)
00401497  fstp        st(1)
00401499  fstp        st(0)
0040149B  mov         eax,dword ptr [esp+14h]
0040149F  fstp        st(1)
004014A1  inc         eax  
004014A2  cmp         eax,dword ptr [ebp]
004014A5  mov         dword ptr [esp+14h],eax
004014A9  jl          _fft+43h (4013D3h)
004014AF  pop         edi  
004014B0  pop         esi  
004014B1  pop         ebp  
004014B2  fstp        st(0)
004014B4  pop         ebx  
   284:       ur = tr;
   285:     }
   286:   }
   287: }
004014B5  add         esp,0Ch
004014B8  ret              
   268:
   269:     wr = cosf(w);
   270:     wi = sinf(w) * df;
   271:     w /= 2.0f;
   272:
   273:     for (j = 0; j < le; j++) {
004014B9  fstp        st(2)
004014BB  fstp        st(2)
004014BD  fstp        st(1)
004014BF  jmp         _fft+10Bh (40149Bh)



Last edited by Wanderer on 08-Mar-2012 at 11:35 AM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 8-Mar-2012 11:36:56
#158 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@Kicko

This is too long. The problem with videos is always to keep them short.

I would try to sqeeze it into 10mins, or make several videos a 10mins each.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 8-Mar-2012 11:41:23
#159 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@Bernd

> maybe if we find the "trick" we can speedup winuae more .
The other Bernd (JIT Author) told some years ago, that the JIT output is based on x486 or even older instruction set (dont remember, could be x386), and thus, there is plenty of possibilities to make it faster on modern machines if you make a compatibelity cut.
E.g. different re-ordering to support longer pipelines, use SIMD instructions etc.


_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 8-Mar-2012 11:44:16
#160 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@itix

Interesting. I just wonder why the checksum of the AB3 version is so inaccurate. Probalby the FPU is running in a different mode.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle