Poster | Thread |
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 16:04:44
| | [ #161 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| @damocles
Quote:
Does anyone really care about Altivec in 2013? |
Yes Altivec is great, you can execute about 4 instructions at the same time, but it has it problems, first of all you need to compile the binary whit altivec optimizing.
GCC does not allow you to make inline assembler code whit it and whit out, so you can't switch between while programs runs, so you end up whit two exe files, or common library or something like that.
2en problem is poor documentation on the internet; I have spent a lot of time looking for guides how to have to write Altivec assembler code.
3RD problem most AmigaONE/Sam users, do not have a CPU that supports it.
so in the end because of issue 2 and 3, there are not many who know how to, and if they did they might not have the hardware to do it on, and besides only handful of people will be able to make use of it.
Altivec has its own registers, etch of the registers hold a temporary value; as long as registers are not interchanged the operations can go in parallel, this is its advantage.
For example if you have something like this
Load normal register 0 into vector 0 Add 10 to vector 0 Store Vector 0 to normal register 0
This code is just slow as normal code, but if you unroll loops and do.
Load normal register 0 into vector 0 Load normal register 1 into vector 1 Load normal register 2 into vector 2 Add 10 to vector 0 Add 10 to vector 1 Add 10 to vector 2 Store Vector 0 to normal register 0 Store Vector 1 to normal register 1 Store Vector 2 to normal register 2
Then the code is going be executed many times faster than normal code.
Last edited by NutsAboutAmiga on 30-Oct-2013 at 10:57 AM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 04:06 PM.
_________________ http://lifeofliveforit.blogspot.no/ Facebook::LiveForIt Software for AmigaOS |
|
Status: Offline |
|
|
damocles
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 18:32:07
| | [ #162 ] |
|
|
|
Super Member |
Joined: 22-Dec-2007 Posts: 1719
From: Unknown | | |
|
| @NutsAboutAmiga
So basically, no body cares about Altivec in 2013.
_________________ Dammy |
|
Status: Offline |
|
|
minator
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 20:28:37
| | [ #163 ] |
|
|
|
Cult Member |
Joined: 23-Mar-2004 Posts: 989
From: Cambridge | | |
|
| @NutsAboutAmiga
Quote:
Then the code is going be executed many times faster than normal code. |
Actually it will be slower. Probably a lot slower.
I don't know if you've just explained it badly but it looks like you're trying to use the AltiVec unit to do normal (scalar) maths. This makes no sense whatsoever.
You're also moving things to and from the normal scalar registers. This adds a lot of overhead so should be avoided unless absolutely necessary.
Here's a better example:
You have an array of 32 bit numbers and you want to increment them by 10.
In a loop do this: load vector0 from memory (this loads 4x32 bit numbers) vector-add a vector of 10s to vector0 (this adds 4x32 bit numbers) store vector0 to memory (this stores 4x32 bit numbers)
That will do 4 adds per add instruction but there's a load of overhead. Unrolling it will speed it up.
BTW You should also be using intrinsics instead of assembly. They're much easier to use and make the compiler do a load of work for you.
_________________ Whyzzat? |
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 20:31:59
| | [ #164 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| @damocles
Well I care, so the there are some one who cares.
Well I don't think the problem is that people don't care, it just a bit more hazel to get some thing out of it, but it might be worth it, if you have Altivec that is.
The truth is that it might be tiny bit extra that is needed to play HD video at acceptable speed on AmigaONE-X1000 for example, but then its about having some one who knows what they are doing.
Even normal powerpc assembler optimized rutins might do a big difference if some one did take there time to do it. Last edited by NutsAboutAmiga on 30-Oct-2013 at 10:58 AM.
_________________ http://lifeofliveforit.blogspot.no/ Facebook::LiveForIt Software for AmigaOS |
|
Status: Offline |
|
|
tonyw
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 20:36:44
| | [ #165 ] |
|
|
|
Elite Member |
Joined: 8-Mar-2003 Posts: 3240
From: Sydney (of course) | | |
|
| @NutsAboutAmiga
Quote:
Even normal powerpc assembler optimized routines might make a big difference if someone took their time to do it.
|
You can't write better assembler code than the compiler generates from C. It has a lot more insight than you have.
_________________ cheers tony
Hyperion Support Forum: http://forum.hyperion-entertainment.biz/index.php |
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 20:40:46
| | [ #166 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| |
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 29-Oct-2013 20:46:50
| | [ #167 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| @tonyw
Sorry thats easy, the C compiler does crap job at it really.
Do objdump -S on your Exe file and see what it has done, the problem whit C in general is that is pushed to mutch onto the RAM too often, there is often bunch of code that can be removed.
But you don't write a full program in assembler, you only optimize the inner loops, this where it makes most sense, this where you have lot repetitive code being executed over and over again, and this is way it does make a difference.
Lets say you have routine that is executed 10000 to 100000 of times or more.
There are also cases you where you have IF condition in C, that can be replaced by ISEL assembler instruction and eliminating brash jumping.
If programmer is not too stupid he might be able to get few cycles extra out of C too.
It does require a understand of what C language generates, and understanding the consequences, of writing some thing this way, instead of that way.
Last edited by NutsAboutAmiga on 30-Oct-2013 at 09:51 AM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 09:02 PM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 08:58 PM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 08:56 PM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 08:54 PM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 08:51 PM. Last edited by NutsAboutAmiga on 29-Oct-2013 at 08:50 PM.
_________________ http://lifeofliveforit.blogspot.no/ Facebook::LiveForIt Software for AmigaOS |
|
Status: Offline |
|
|
olegil
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 9:21:11
| | [ #168 ] |
|
|
|
Elite Member |
Joined: 22-Aug-2003 Posts: 5895
From: Work | | |
|
| @tonyw
Well, you can, but it usually takes too much effort to be worth it, as the same effort could be invested in rewriting things that are not optimized enough from the programmers side.
Example: Load/store arch (ARM in this case), needed to write some memory mapped registers in a bootloader. Using macros/defines for readability makes it possible to use assembly, but the compiler knew that the same value was ending up in two of the registers (the programmer didn't, as he filled in the values after writing the code). This means that some load instructions were unneccessary, multiple stores from single load saved time and space. Now, rewriting the assembly to be as efficient the C compiler ended up with wouldn't have been difficult by unwrapping the macros and looking at the values to be written. But then what happens if you need to change a value? Assembly: complete rewrite. C: single macro change.
In other instances it makes perfect sense, for instance AVR-GCC which insists on pushing register 1 (which it uses as a zero EVERYWHERE IN THE CODE) and register 0 (temp-reg) before copying the status reg to reg 0 and pushing AGAIN. Even if you write the code to not need changing the status register (simple move/store etc) in your interrupt. I went from push/mov/push/ser/pop/mov/pop/rts to just ser/rts, it took me VERY little time to change, and it really helped with the performance. This was the chipselect line on an SPI slave implementation. Similar fixes was done to a few other interrupts (like setting aside registers for status instead of using stack, I used 6 registers for 3 copies of status and my prime data reg). Without the fixes, the implementation needed a 14 USD FPGA, I managed it with a 1 USD MCU. _________________ This weeks pet peeve: Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 10:11:17
| | [ #169 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @damocles
"So basically, no body cares about Altivec in 2013."
Almost every mainstream CPU has multimedia instructions, they only have different names.
MMX, SSE, 3DNow, VMX, Altivec, NEON etc...
So, almost everybody cares about Altivec (=multimedia instruction unit). (and most devs let compiler do the vectorization/optimization) Last edited by KimmoK on 30-Oct-2013 at 10:12 AM.
_________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
olegil
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 12:00:28
| | [ #170 ] |
|
|
|
Elite Member |
Joined: 22-Aug-2003 Posts: 5895
From: Work | | |
|
| @minator
For what it's worth, I understood what he was saying, if you unroll loops you can vectorise scalar math. You just took it one step further WHILE saying he was completely wrong. _________________ This weeks pet peeve: Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean. |
|
Status: Offline |
|
|
olegil
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 12:04:26
| | [ #171 ] |
|
|
|
Elite Member |
Joined: 22-Aug-2003 Posts: 5895
From: Work | | |
|
| @damocles
Maybe no body cares, but a lot of minds care.
SIMD is important in 2013, because it means you can process more data per clock and if everyone else uses it then it becomes essential.
For instance small ARM processors which completely suck at general processing but excel at video compression/decompression. While consuming hardly any power from a tiny battery. _________________ This weeks pet peeve: Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean. |
|
Status: Offline |
|
|
damocles
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 12:28:39
| | [ #172 ] |
|
|
|
Super Member |
Joined: 22-Dec-2007 Posts: 1719
From: Unknown | | |
|
| @KimmoK
Quote:
Almost every mainstream CPU has multimedia instructions, they only have different names. MMX, SSE, 3DNow, VMX, Altivec, NEON etc... |
No, don't go there. I specifically said Altivec and nothing else. According to NutsAboutAmiga, most AmigaOne and SAM owners do not have CPUs with Altivec. Since it has to be compiled in, just how many Amiga OS4 binaries out there were compiled with Altivec at various Amiga OS4 file depots vs Amiga OS4 binaries compiled without Altivec?
If Trevor is producing A1X#?Ks that do not have Altivec, a tiny population is going to care about it, vast majority will not.
_________________ Dammy |
|
Status: Offline |
|
|
olegil
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 14:22:51
| | [ #173 ] |
|
|
|
Elite Member |
Joined: 22-Aug-2003 Posts: 5895
From: Work | | |
|
| @damocles
Circular argument.
-"Feature X needs improving, as it's just too hard to use." -"But noone uses that feature, since it's too hard to use."
We have machines with Altivec, therefore developers should be encouraged to utilize it rather than hardware manufacturers being encouraged to drop it. _________________ This weeks pet peeve: Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean. |
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 14:23:11
| | [ #174 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| @damocles
We don't have any number on that as we do not know sales figures, it can be 50/50, 60/40 or 40/60, or any combination, what we do know is that there are more CPU's that does not support it then the once that does.
603, 604, G3, AMC440, P5040, P5020 and AMC460 is not Altivec G4 and PA6T is Altivec compliant.
(G3 and G4 are many different CPU models) Well it might be nice find it out; maybe we should start a pool.
But it looks like it's more CPU's that does not support it and is coming then ones that do, unless you optimize for self-interests, its more practical to just do normal assembler optimizing instead.
Last edited by NutsAboutAmiga on 30-Oct-2013 at 02:41 PM. Last edited by NutsAboutAmiga on 30-Oct-2013 at 02:24 PM.
_________________ http://lifeofliveforit.blogspot.no/ Facebook::LiveForIt Software for AmigaOS |
|
Status: Offline |
|
|
damocles
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 14:32:01
| | [ #175 ] |
|
|
|
Super Member |
Joined: 22-Dec-2007 Posts: 1719
From: Unknown | | |
|
| @olegil
Quote:
We have machines with Altivec, |
Which ones and how many? _________________ Dammy |
|
Status: Offline |
|
|
NutsAboutAmiga
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 14:35:54
| | [ #176 ] |
|
|
|
Elite Member |
Joined: 9-Jun-2004 Posts: 12818
From: Norway | | |
|
| |
Status: Offline |
|
|
damocles
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 15:10:41
| | [ #177 ] |
|
|
|
Super Member |
Joined: 22-Dec-2007 Posts: 1719
From: Unknown | | |
|
| @NutsAboutAmiga
Quote:
603, 604, G3, AMC440, P5040, P5020 and AMC460 is not Altivec G4 and PA6T is Altivec compliant. (G3 and G4 are many different CPU models) Well it might be nice find it out; maybe we should start a pool. But it looks like it's more CPU's that does not support it and is coming then ones that do, unless you optimize for self-interests, its more practical to just do normal assembler optimizing instead. |
It looks to me that the lack of Altivec support in Trevor's upcoming line of computers will not hurt his future sales at all. Too few Amiga OS4 systems have Altivec, and those few who do, do not have a massive amount of applications/games that require Altivec in the first place.
_________________ Dammy |
|
Status: Offline |
|
|
olegil
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 15:31:40
| | [ #178 ] |
|
|
|
Elite Member |
Joined: 22-Aug-2003 Posts: 5895
From: Work | | |
|
| @damocles
It looks to me that lack of Altivec hurts sales of upcoming AmigaOnes day in an day out, judging from the snide remarks on online forums regarding said lack.
On the other hand, PA6T wasn't all that impressive anyway.
As a potential customer, I for one would very much welcome Altivec. But as evidenced by the post I link to in my sig, it's not a deal-breaker for me. Current lack of funds (or rather, abundance of things with higher importance to spend them on) is worse _________________ This weeks pet peeve: Using "voltage" instead of "potential", which leads to inventing new words like "amperage" instead of "current" (I, measured in A) or possible "charge" (amperehours, Ah or Coulomb, C). Sometimes I don't even know what people mean. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 16:58:51
| | [ #179 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| |
Status: Offline |
|
|
broadblues
| |
Re: Next Freescale high performance PPC chip. Posted on 30-Oct-2013 18:07:21
| | [ #180 ] |
|
|
|
Amiga Developer Team |
Joined: 20-Jul-2004 Posts: 4446
From: Portsmouth England | | |
|
| @KimmoK
Quote:
Is it possible to tell the compiler what code segment optimize for Altivec and what not?
|
Only if the code is in seperate object files. If you enable altivec then the compile will enable optinistaion as well and so optimise the general code, so you must seperate out all your altivec code from the general code.
Not difficult to do once you know, but decidedly confusing for while if you don;t/
_________________ BroadBlues On Blues BroadBlues On Amiga Walker Broad |
|
Status: Offline |
|
|