Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
» Home
» Features
» News
» Forums
» Classifieds
» Links
» Downloads
Extras
» OS4 Zone
» IRC Network
» AmigaWorld Radio
» Newsfeed
» Top Members
» Amiga Dealers
Information
» About Us
» FAQs
» Advertise
» Polls
» Terms of Service
» Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
22 crawler(s) on-line.
 77 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 matthey:  15 mins ago
 Deniil715:  21 mins ago
 Lou:  1 hr 24 mins ago
 OlafS25:  2 hrs 3 mins ago
 CosmosUnivers:  2 hrs 9 mins ago
 zipper:  2 hrs 12 mins ago
 kolla:  2 hrs 15 mins ago
 OneTimer1:  2 hrs 23 mins ago
 fatbob_gb:  2 hrs 38 mins ago
 bhabbott:  2 hrs 39 mins ago

/  Forum Index
   /  Amiga OS4 Software
      /  HD-REC on OS4.x on X1000 ?
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )
PosterThread
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 27-Feb-2012 14:09:19
#121 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

You can also use it in the Wave Editor, that should work. You can use the pre-listen function, or, if not realtime possible, render it to a test sample, listen, and then undo.

At the beginning I was also aiming for a steeper filter, but people who are more knowladgable than me told me to use less steepness.

The reasons are the problems with sound events at the filter-band bounaries, and the ringing instroduced by IIR filters. The steeper the filter, the higher the ringing. you can only use more coefficients for the filter, but then you also need more CPU power.

In many situations those things won't be noticable, of course. But if the filters are accuarate and the X-frqeunecy-jumping problem is lowered, I simply have a better concious to run the effect over my precious master recording...

If I have time I will experiment with steeper filter. Till now, I had quite good results with the 12db filters.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 27-Feb-2012 14:49:59
#122 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:

and the ringing instroduced by IIR filters.


I have Jamin set use use FFT filters (which is the default) It's an awfully long time since I studied digital filters at univeristy so can't remeber the difference, but could this be a factor in producing a less distorted steep filter?

Anyway I'm certainly no exprt in this field, I will keep trying it out when I need to master a video sound check etc Because the less often I have to switch my linux box on the better!



_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 27-Feb-2012 15:57:53
#123 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

Yes, FFT allows you to do very steep filters with low ringing (e.g. if proper Hanning windowed).

But that is too heavy for your SAM. See the FFT benchmarks in this thread. Even on WinUAE, I would not want the Multiband compresser eat up more than - lets say - 10% of my CPU, since I have plenty of other stuff running.
I used the FFT only in the Denoiser, since there it is essential, plus, this is a typical "offline" effect that you render into the wave sample.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 4-Mar-2012 22:04:48
#124 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@Thread

I have written an interesting benchmark here: FFTDemo

It is basically the same like the FFT demo before posted by Bernd, but it is available as Amiblitz and C code. I compiled binaries for Windows and 68K Amiga.
There is an additional test number that is a checksum of the FFT calculations. This ensures that the FFT must be actually computed and not bypassed.

Here are my (surprising, or not so surprising results):

Test Results on my Laptop (i5@2.5GHz):

WinUAE + GCC:
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 3620ms for 4096000 samples, => 12.83x speed @44100Hz/mono (test=1.9523, should be ~1.9523)

WinUAE + AmiBlitz3
Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float)
Time needed 2577ms for 4096000 samples, => 18.02x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)

Win32:
Speed test for FFT + iFFT: (C, x86/Win32, float)
Time needed 223ms for 4096000 samples, => 208.25x speed @44100Hz/mono (test=1.9523, should be ~1.9523)


What does that mean?

1. WinUAE wastes a factor of at least 10(!) when running 68K Code vs. native.

2. GCC is slower than Amiblitz, or I misconfigured the optimizer.
Both, GCC and Amiblitz have a quite naive implementation of the FFT, no special actions were taken to make it particularly fast on the code level.

3. You can easily create PPC native version of this test for MOS and OS4 and see how fast you can get native, and how fast the 68K emu is.

The results are consistent to the test before, the only difference is that I test 10x more loops to be more accurate and it tests mono FFT, means the RTF result is twice as fast because it runs on a single channel instead of two channels.


The 68K Amiga Version was compiled with this command line (GCC)
gcc FFT.c FFTDemo.c -o2 -o ../FFTDemo_68K_GCC -m68040 -m68881

And with Amiblitz3:
Amiblitz3 -s FFTDemo.ab3 -e /FFTDemo_68K_AB3 -release

The x86 Win32 Version was compiled with this command line (Visual Studio)
/O2 /Ob2 /GL /D "WIN32" /FD /MD /fp:fast /W3 /nologo /c /Zi

Now, I am quite curious about PPC results, and if someone could explain why the GCC looses to badly againt Amiblitz3. What I know:

1. Amiblitz has FPU Register optimization, but GCC should have this too, no?
2. Amiblitz does not have array indexing and must compute the memory offset of the complex floats manually. I would have assumed that this would be slower.

Last edited by Wanderer on 04-Mar-2012 at 10:12 PM.
Last edited by Wanderer on 04-Mar-2012 at 10:10 PM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
bernd_afa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 11:40:54
#125 ]
Cult Member
Joined: 14-Apr-2006
Posts: 829
From: Unknown

@Wanderer

can you post the X86 asm output of Visual C of the main CPU cylce function ?.You can also do a screenshot of VC debugger in asm output mode, that show the asm code of this function that is mostly process.

Its this right ?

void _fftD(fftH *fft, fftCD *buffer, int inverse) {

i dont think that native X86 can be so much faster.also the X86 file is very small.
best you do same as you do in your fft include and print some values that show, that the code is correct execute.not that Visual C do a optimizer trick because it know your code is not usefull

but when you post asm output can see too what asm instructions are produce

Last edited by bernd_afa on 06-Mar-2012 at 11:49 AM.
Last edited by bernd_afa on 06-Mar-2012 at 11:42 AM.
Last edited by bernd_afa on 06-Mar-2012 at 11:41 AM.
Last edited by bernd_afa on 06-Mar-2012 at 11:41 AM.

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 13:02:00
#126 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:

The 68K Amiga Version was compiled with this command line (GCC)
gcc FFT.c FFTDemo.c -o2 -o ../FFTDemo_68K_GCC -m68040 -m68881



-o2 for optimistaion? should that not be -O2 ?




_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 15:45:04
#127 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@bernd_afa

Here is the x86 Code in Debug mode (I dont know how to display the disassembler in release mode)

void _fft(fftH *fft, fftCF *buffer, int inverse) {
004119C0  push        ebp  
004119C1  mov         ebp,esp
004119C3  sub         esp,74h
004119C6  push        ebx  
004119C7  push        esi  
004119C8  push        edi  

  float ur,ui,wr,wi,tr,ti;
  int k,j,i;

  int le  = 0;
004119C9  mov         dword ptr [le],0
  int le2 = 1;
004119D0  mov         dword ptr [le2],1
  float w = ((float)M_PI);
004119D7  fld         dword ptr [__real@40490fdb (415884h)]
004119DD  fstp        dword ptr [w]
  float df=1.0f;
004119E0  fld1            
004119E2  fstp        dword ptr [df]
  if (inverse) df = -1.0f;
004119E5  cmp         dword ptr [inverse],0
004119E9  je          _fft+34h (4119F4h)
004119EB  fld         dword ptr [__real@bf800000 (41587Ch)]
004119F1  fstp        dword ptr [df]

  for (k = 0; k < fft->order; k++) {
004119F4  mov         dword ptr [k],0
004119FB  jmp         _fft+46h (411A06h)
004119FD  mov         eax,dword ptr [k]
00411A00  add         eax,1
00411A03  mov         dword ptr [k],eax
00411A06  mov         eax,dword ptr [fft]
00411A09  mov         ecx,dword ptr [k]
00411A0C  cmp         ecx,dword ptr [eax]
00411A0E  jge         _fft+1BEh (411B7Eh)
    le = le2;
00411A14  mov         eax,dword ptr [le2]
00411A17  mov         dword ptr [le],eax
    le2 <<= 1;
00411A1A  mov         eax,dword ptr [le2]
00411A1D  shl         eax,1
00411A1F  mov         dword ptr [le2],eax
    ur = 1.0f;
00411A22  fld1            
00411A24  fstp        dword ptr [ur]
    ui = 0.0f;
00411A27  fldz            
00411A29  fstp        dword ptr [ui]

    wr = cosf(w);
00411A2C  fld         dword ptr [w]
00411A2F  sub         esp,8
00411A32  fstp        qword ptr [esp]
00411A35  call        @ILT+225(_cos) (4110E6h)
00411A3A  add         esp,8
00411A3D  fstp        dword ptr [wr]
    wi = sinf(w) * df;
00411A40  fld         dword ptr [w]
00411A43  sub         esp,8
00411A46  fstp        qword ptr [esp]
00411A49  call        @ILT+190(_sin) (4110C3h)
00411A4E  add         esp,8
00411A51  fmul        dword ptr [df]
00411A54  fstp        dword ptr [wi]
    w /= 2.0f;
00411A57  fld         dword ptr [w]
00411A5A  fdiv        qword ptr [__real@4000000000000000 (415870h)]
00411A60  fstp        dword ptr [w]

    for (j = 0; j < le; j++) {
00411A63  mov         dword ptr [j],0
00411A6A  jmp         _fft+0B5h (411A75h)
00411A6C  mov         eax,dword ptr [j]
00411A6F  add         eax,1
00411A72  mov         dword ptr [j],eax
00411A75  mov         eax,dword ptr [j]
00411A78  cmp         eax,dword ptr [le]
00411A7B  jge         _fft+1B9h (411B79h)
      for (i = j; i < fft->npoints; i += le2) {
00411A81  mov         eax,dword ptr [j]
00411A84  mov         dword ptr [i],eax
00411A87  jmp         _fft+0D2h (411A92h)
00411A89  mov         eax,dword ptr [i]
00411A8C  add         eax,dword ptr [le2]
00411A8F  mov         dword ptr [i],eax
00411A92  mov         eax,dword ptr [fft]
00411A95  mov         ecx,dword ptr [i]
00411A98  cmp         ecx,dword ptr [eax+4]
00411A9B  jge         _fft+18Ch (411B4Ch)
        tr = buffer[i+le].r * ur - buffer[i+le].i * ui;
00411AA1  mov         eax,dword ptr [i]
00411AA4  add         eax,dword ptr [le]
00411AA7  mov         ecx,dword ptr [buffer]
00411AAA  fld         dword ptr [ecx+eax*8]
00411AAD  fmul        dword ptr [ur]
00411AB0  mov         edx,dword ptr [i]
00411AB3  add         edx,dword ptr [le]
00411AB6  mov         eax,dword ptr [buffer]
00411AB9  fld         dword ptr [eax+edx*8+4]
00411ABD  fmul        dword ptr [ui]
00411AC0  fsubp       st(1),st
00411AC2  fstp        dword ptr [tr]
        ti = buffer[i+le].r * ui + buffer[i+le].i * ur;
00411AC5  mov         eax,dword ptr [i]
00411AC8  add         eax,dword ptr [le]
00411ACB  mov         ecx,dword ptr [buffer]
00411ACE  fld         dword ptr [ecx+eax*8]
00411AD1  fmul        dword ptr [ui]
00411AD4  mov         edx,dword ptr [i]
00411AD7  add         edx,dword ptr [le]
00411ADA  mov         eax,dword ptr [buffer]
00411ADD  fld         dword ptr [eax+edx*8+4]
00411AE1  fmul        dword ptr [ur]
00411AE4  faddp       st(1),st
00411AE6  fstp        dword ptr [ti]
        buffer[i+le].r = buffer[i].r - tr;
00411AE9  mov         eax,dword ptr [i]
00411AEC  mov         ecx,dword ptr [buffer]
00411AEF  fld         dword ptr [ecx+eax*8]
00411AF2  fsub        dword ptr [tr]
00411AF5  mov         edx,dword ptr [i]
00411AF8  add         edx,dword ptr [le]
00411AFB  mov         eax,dword ptr [buffer]
00411AFE  fstp        dword ptr [eax+edx*8]
        buffer[i+le].i = buffer[i].i - ti;
00411B01  mov         eax,dword ptr [i]
00411B04  mov         ecx,dword ptr [buffer]
00411B07  fld         dword ptr [ecx+eax*8+4]
00411B0B  fsub        dword ptr [ti]
00411B0E  mov         edx,dword ptr [i]
00411B11  add         edx,dword ptr [le]
00411B14  mov         eax,dword ptr [buffer]
00411B17  fstp        dword ptr [eax+edx*8+4]
        buffer[i].r += tr;
00411B1B  mov         eax,dword ptr [i]
00411B1E  mov         ecx,dword ptr [buffer]
00411B21  fld         dword ptr [ecx+eax*8]
00411B24  fadd        dword ptr [tr]
00411B27  mov         edx,dword ptr [i]
00411B2A  mov         eax,dword ptr [buffer]
00411B2D  fstp        dword ptr [eax+edx*8]
        buffer[i].i += ti;
00411B30  mov         eax,dword ptr [i]
00411B33  mov         ecx,dword ptr [buffer]
00411B36  fld         dword ptr [ecx+eax*8+4]
00411B3A  fadd        dword ptr [ti]
00411B3D  mov         edx,dword ptr [i]
00411B40  mov         eax,dword ptr [buffer]
00411B43  fstp        dword ptr [eax+edx*8+4]
      }
00411B47  jmp         _fft+0C9h (411A89h)
      tr = ur*wr - ui*wi;
00411B4C  fld         dword ptr [ur]
00411B4F  fmul        dword ptr [wr]
00411B52  fld         dword ptr [ui]
00411B55  fmul        dword ptr [wi]
00411B58  fsubp       st(1),st
00411B5A  fstp        dword ptr [tr]
      ui = ur*wi + ui*wr;
00411B5D  fld         dword ptr [ur]
00411B60  fmul        dword ptr [wi]
00411B63  fld         dword ptr [ui]
00411B66  fmul        dword ptr [wr]
00411B69  faddp       st(1),st
00411B6B  fstp        dword ptr [ui]
      ur = tr;
00411B6E  fld         dword ptr [tr]
00411B71  fstp        dword ptr [ur]
    }
00411B74  jmp         _fft+0ACh (411A6Ch)
  }
00411B79  jmp         _fft+3Dh (4119FDh)
}
00411B7E  pop         edi  
00411B7F  pop         esi  
00411B80  pop         ebx  
00411B81  mov         esp,ebp
00411B83  pop         ebp  
00411B84  ret              


Quote:

Its this right ?

void _fftD(fftH *fft, fftCD *buffer, int inverse) {

This is the double float version of the FFT. I use the single precision which is called _fft().

This is the speed result of the debug version:
Speed test for FFT + iFFT: (C, x86/Win32, float)
Time needed 646ms for 4096000 samples, => 71.89x speed @44100Hz/mono (test=1.9523, should be ~1.9523)

Still much faster than WinUAE.
The x86 file is very small, I do believe it does a lot of dead code elemination. As I wrote above the FFT is truly executed, since it computes the cross-sum (1.5923).
The optimized must be very clever to cheat this. (not impossible though).

Last edited by Wanderer on 06-Mar-2012 at 03:47 PM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 15:51:58
#128 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

I tried -O2. Now the optimizer is on. I dont know why -o2 didt throw me an error message?

Here is the Result:


Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 1620ms for 4096000 samples, => 28.67x speed @44100Hz/mono (test=1.9523, should be ~1.9523)

More than two times faster.
Now GCC wins over Amiblitz.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:13:43
#129 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer


10.RAM Disk:FFTDemo> FFTDemo_PPC_GCC
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 19872ms for 4096000 samples, => 2.34x speed @44100Hz/mono (test=1.9523, should be ~1.9523)

This was on my SAM-Flex 733 without any mofifications to the code just a straght recompile (hence it reports 68040 when it's really PPC)

here the amiblitz version on my SAM

10.RAM Disk:FFTDemo> FFTDemo_68K_AB3
Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float)
Time needed 28561ms for 4096000 samples, => 1.63x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)

I can't run the 68k gcc version as I don't have any ixemul.library installed (Iused to have one I'll if I can find it)







_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:26:36
#130 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@broadblues

I found an ixemul but it report 0ms so something up with the ixemul loib as it clearly didn;t run at infinite speed.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:28:55
#131 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@broadblues

Sorry, the check for 68040 is just the check for AMIGA define, I didnt make that sophisticated.

2.34x speed native @733MHz ? Hm... On the other hand, the JIT seems to be pretty good. Given that the GCC Version is now 50% faster than the Amiblitz Version, that would be approximately same speed as native. I asume you used the uppercase -O2 ?

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:29:57
#132 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

Quote:

I dont know why -o2 didt throw me an error message?


It's a valid option, settings the output file to "2" then your later switch -o FFT... overode it. gcc allows this kind of things so that you can vary switches along the command line for different inputs.

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:34:11
#133 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@Wanderer

yes upercase -O2 I tried with -O3 too but that was marginally slower.





_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
bernd_afa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:35:22
#134 ]
Cult Member
Joined: 14-Apr-2006
Posts: 829
From: Unknown

@Wanderer

irs intresting to see how fast X86 native is, when no optimizer is use.it can good compare to GCC native because then on all CPU simular instructions are execute.

I compile the 68k C Version myself in amidev cpp.Its right only -O3 is ok.

Your C version get on my system

15.h0:wbstartup> "Ram Disk:FFTDemo/FFTDemo_68K_GCC"
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 3240ms for 4096000 samples, => 14.33x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
15.h0:wbstartup>

my 68k C build with option -O3

15.h0:wbstartup> h1:amidevcpp/bernd/test/fftbench.exe
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 920ms for 4096000 samples, => 50.48x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
15.h0:wbstartup>

and amiblitz is slower.

15.h0:wbstartup> "Ram Disk:FFTDemo_68K_AB3"
Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float)
Time needed 2114ms for 4096000 samples, => 21.97x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)

when look on GCC code for 68k your code is lots diffrent than formula.thats the speedcritical part.

for (j = 0; j < le; j++) {
for (i = j; i < fft->npoints; i += le2) {
tr = buffer[i+le].r * ur - buffer[i+le].i * ui;
ti = buffer[i+le].r * ui + buffer[i+le].i * ur;
buffer[i+le].r = buffer[i].r - tr;
buffer[i+le].i = buffer[i].i - ti;
buffer[i].r += tr;
buffer[i].i += ti;
}

maybe fft is a well known standard function and GCC optimizer use then a optimized snippet for this

125D52DA): FSMOVE.S 0(A0,D4.L*8),FP0
125D52E0): FSMOVE.X FP0,FP3
125D52E4): FSMUL.X FP4,FP3
125D52E8): FSMOVE.S 4(A0,D4.L*8),FP1
125D52EE): FSMOVE.X FP1,FP2
125D52F2): FSMUL.X FP5,FP2
125D52F6): FSSUB.X FP2,FP3
125D52FA): FSMUL.X FP5,FP0
125D52FE): FSMUL.X FP4,FP1
125D5302): FSADD.X FP1,FP0
125D5306): FSMOVE.S (A0),FP2
125D530A): FSSUB.X FP3,FP2
125D530E): FMOVE.S FP2,0(A0,D4.L*8)
125D5314): FSMOVE.S 4(A0),FP1
125D531A): FSSUB.X FP0,FP1
125D531E): FMOVE.S FP1,4(A0,D4.L*8)
125D5324): FSADD.S (A0),FP3
125D5328): FMOVE.S FP3,(A0)
125D532C): FSADD.S 4(A0),FP0
125D5332): FMOVE.S FP0,4(A0)
125D5338): ADD.L D5,D0
125D533A): ADDA.L D2,A0
125D533C): MOVE.L 4(A2),D1
125D5340): CMP.L D1,D0
125D5342): BLT.S __fft+$CA ;125D52DA


here is the value of X86 native version

C:\Users\pc>H:\test\FFTDemo_Win32.exe
Speed test for FFT + iFFT: (C, x86/Win32, float)
Time needed 296ms for 4096000 samples, => 156.89x speed @44100Hz/mono (test=1.95
23, should be ~1.9523)

X86 native is then only 3.2 * faster. intresting what asm code they use to get such a big speedup

Last edited by bernd_afa on 06-Mar-2012 at 04:45 PM.
Last edited by bernd_afa on 06-Mar-2012 at 04:35 PM.

 Status: Offline
Profile     Report this post  
broadblues 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:40:00
#135 ]
Amiga Developer Team
Joined: 20-Jul-2004
Posts: 4446
From: Portsmouth England

@broadblues

wow this one is a bit supriseing I us -Os (optimise for size) instead and got a substantial speedup!


10.RAM Disk:FFTDemo/src> gcc -Os FFT.c FFTDemo.c -o ../FFTDemo_PPC_GCC_OS
10.RAM Disk:FFTDemo/src> /
10.RAM Disk:FFTDemo> FFTDemo_PPC_GCC_OS
Speed test for FFT + iFFT: (C, 68040+FPU, float)
Time needed 5278ms for 4096000 samples, => 8.80x speed @44100Hz/mono (test=1.9523, should be ~1.9523)
10.RAM Disk:FFTDemo>

_________________
BroadBlues On Blues BroadBlues On Amiga Walker Broad

 Status: Offline
Profile     Report this post  
bernd_afa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 16:58:31
#136 ]
Cult Member
Joined: 14-Apr-2006
Posts: 829
From: Unknown

@broadblues

can you offer for upload the PPC version+ source so other can test ?
strange that it give such a big speedboost.maybe you can compile blender and the speed test with that setting and its faster as aone

Last edited by bernd_afa on 07-Mar-2012 at 08:36 AM.
Last edited by bernd_afa on 06-Mar-2012 at 05:02 PM.

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 18:34:10
#137 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@bernd_afa
Please dont make copies and provide them for download. This will cause quickly confusion about the benchmark. Better send the changes to me and I will update the archive. Your exe is btw. 34x speed, means it is slightly faster than GCC with 28x.
If you think Amiblitz is behind because of the code, I can take a look at it. The C code is pretty "dense" which is good for the C optimizer. But I think I should get close with Amiblitz code.

BTW, how can I detect easily the system it is compiled for, so I can add it to the shell output?

"AMIGA" seems to be set on MOS too. Is there something like 68K, PPC or MC68020 etc. ?

If I want to get rid of ixemul, there is a compiler switch, right? But I wont be able to do shell output with printf, or?


For the Benchmark Code:
I build the sum of various coefficients of the FFT each loop. I doubt that the optimizer is able to predict this. Also the input is re-written every loop, so the optimizer cannot detect a loop-invariant here. But I can make it even more difficult to predict by initializing the time buffer with white noise that is different every loop. The speedup with -Os on the PPC seems to be quite suspicious.

Last edited by Wanderer on 06-Mar-2012 at 06:40 PM.
Last edited by Wanderer on 06-Mar-2012 at 06:37 PM.
Last edited by Wanderer on 06-Mar-2012 at 06:36 PM.

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
wawa 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 18:41:32
#138 ]
Elite Member
Joined: 21-Jan-2008
Posts: 6259
From: Unknown

@Wanderer


-noixemul ?

 Status: Offline
Profile     Report this post  
Tuxedo 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 18:56:22
#139 ]
Elite Member
Joined: 28-Nov-2003
Posts: 2341
From: Perugia, ITALY

@Wanderer

on my AmigaOS4.1Upd4 Peg2@1131MHz

I get:


8.RAM Disk:FFTDemo> FFTDemo_68K_AB3
Speed test for FFT/iFFT: (AmiBlitz3, 68040+FPU, float)
Time needed 14007ms for 4096000 samples, => 3.32x speed @44100Hz/mono (test= 1.9523, should be ~1.9523)


GCC version as broadblues sayd dont works...

_________________
Simone"Tuxedo"Monsignori, Perugia, ITALY.

 Status: Offline
Profile     Report this post  
Wanderer 
Re: HD-REC on OS4.x on X1000 ?
Posted on 6-Mar-2012 18:59:09
#140 ]
Cult Member
Joined: 16-Aug-2008
Posts: 654
From: Germany

@wawa

noixemul doesnt let me do formatted output and timing anymore. I would need to write it via AmigaOS API.

@people
please wait with benchmarking, I am doing some changes...

_________________
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle