Poster | Thread |
Deniil715
| |
Dual/single DIMM memory bandwidth Posted on 15-Oct-2012 9:17:53
| | [ #1 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| |
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 15-Oct-2012 10:26:28
| | [ #2 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @Deniil715
RageMem http://www.amiga-news.de/en/news/AN-2012-02-00011-EN.html READ32: 2857 MB/Sec READ64: 4000 MB/Sec WRITE32: 2732 MB/Sec WRITE64: 3383 MB/Sec
It would be nice to see new RageMem benchmark results. I imagine Video RAM acces should have increased 10x by now.
UPDATE: I doubt we have a application that could put any big stress to x1000 memory bus, even if only one channel is used. When AOS4 gets multicore support, things might change. Last edited by KimmoK on 15-Oct-2012 at 11:35 AM.
_________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 15-Oct-2012 12:06:28
| | [ #3 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @KimmoK
You don't say if you have single or dual DIMMs But your read64 indicates it may be interleaving compared to my memspeed results.
Quote:
I doubt we have a application that could put any big stress to x1000 memory bus, even if only one channel is used. When AOS4 gets multicore support, things might change. |
HD video! This needs all the bandwidth we can get, especaially if the blitting DMA to the gfx card also makes use of interleaving (which I assume since the frame buffer memory will be interleaved between the banks)._________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 15-Oct-2012 12:19:04
| | [ #4 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @KimmoK
So the G4 core is faster than PA6T core. Interesting, and tragic at the same time... but the PA6T uses quite a bit less power if I'm not mistaken. Or could it be that all this Altivec code was G4 optimized and may therefore run a number of percent slower on the PA6T core that what it could had it been properly optimized..?
Also funny that the X1000 memory bandwidth is almost the same as the L2 cache! Also notable is that tricky writing very obviously turns something up-side-down for the X1000 memory controller while almost doubling it for the G4 controller. _________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 16-Oct-2012 7:38:39
| | [ #5 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @Deniil715
Learning time for coders for PA6T tricks.
PA6T CPU core has longer pipeline than G4 core. Therefore it is slightly behind per Mhz. But PA6T has a multicore design, that's why it has so insanely speedy memory interface. Really, it almost seems like the whole RAM of x1000 performs as fast as old L2.
Too bad that PA6T is not sane price wise. Otherwise it would be very coold chip to play with. Remembering that untill now it has been available only for military use ! So for possible code optimization tricks only very few people know them (Varisys, ex-PaSemi, USA military industry).
btw. I do not have x1000, not planning for it either, just provided that info. _________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 16-Oct-2012 12:48:30
| | [ #6 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @KimmoK
A longer pipeline should make it faster per MHz, as long as you don't cause pipeline stalls/flushes. I would guess G4 (or even general 603) optimized code might very well cause more flushes than necessary in the PA6T core. Now let's see an updated SDK with gcc with PA6T option
The longer pipeline most likely makes each step simpler, which means more shallow gate depth, which means it can run at a lower voltage at a higher frequency. There is a reason the 4GHz P-IV had a 32-stage(!) pipeline. The G4 has 6-8 steps or something IIRC..? What does the PA6T have? Last edited by Deniil715 on 16-Oct-2012 at 12:49 PM.
_________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 16-Oct-2012 14:10:47
| | [ #7 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| |
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 17-Oct-2012 15:44:03
| | [ #8 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @KimmoK
This thing is really designed for low power above all else it seems. It kind of looks like the pre-execution pipeline is 10 and then a number of execution pipes get stuffed for actual instruction execution with varying length from 3 and up. _________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
sundown
| |
Re: Dual/single DIMM memory bandwidth Posted on 17-Oct-2012 23:52:16
| | [ #9 ] |
|
|
|
Elite Member |
Joined: 30-Aug-2003 Posts: 5120
From: Right here... | | |
|
| @Deniil715
Quote:
However, I ordered 2GB or memory but got only one module, so now I'm wondering if my machine runs at half the speed because of this |
OS4.1 can only see 2GB, linux can see more. Having 2 memory sticks will only marginally increase memory speed, not enough to notice with usage. Having only one will not slow you down. In the beginning, I did tests with 1 vs 2 2GB sticks & 1 vs 2 1GB sticks & saw very little difference at the time._________________ Hate tends to make you look stupid... |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 18-Oct-2012 9:34:19
| | [ #10 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @sundown
"will only marginally increase memory speed"
I think it should almost double the memory speed.
"not enough to notice with usage."
That most likely is absolutely true fact. We do not have applications that would stress the memory interface.
Most of our applications spend their time inside the PA6T L2 cache boundaries!!! _________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
sundown
| |
Re: Dual/single DIMM memory bandwidth Posted on 18-Oct-2012 18:37:21
| | [ #11 ] |
|
|
|
Elite Member |
Joined: 30-Aug-2003 Posts: 5120
From: Right here... | | |
|
| @KimmoK
Quote:
"will only marginally increase memory speed"
I think it should almost double the memory speed. |
Dual memory isn't the same as dual core, if 2 memory sticks made that much of a difference, everyone would know about it by now, especially, me. _________________ Hate tends to make you look stupid... |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 25-Oct-2012 15:09:28
| | [ #12 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @sundown
Since a single stick seems to be almost as fast as the L2 cache (on a single core) I guess doubling the memory bandwidth is of little use (now).
What needs to be tested, once we get dual-core support, is to run two memtester at the same time and compare the individual memory speed they meassure compared to running only one memtester alone on one core. In this case we may see a substantial drop in memory bandwidth per core with only stick.
Remains to be seen _________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 25-Oct-2012 18:12:54
| | [ #13 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @sundown
And dual core support has no effect on memory bus speed that I meant.
No one tried to run stream memspeed or ragemem with one and two dimm configuration??
On a PA6T one should get theorethically 16GB/sec READ or WRITE speed when DDR2 is in 1066Mhz mode (IIRC, it is assynchronous with CPU core clock on PA6T). And "only" 8GB/sec when only one DIMM or bus is used.
PA6T L2 theorethical speed is 16GB/sec read AND 16GB/sec write (R+W can happen at the same time) when core is at 2Ghz.
(L1 does 32GB/sec read OR write when core is at 2Ghz)
UPDATE: After looking again those memspeed results from amiga-news... -L1 speed is 25% of the theorethical maximum. -L2 and DDR2 speeds are around 25% of the theorethical maximum.
Interesting how far of those numbers are from what PA Semi have told. Some room for optimization all around? Last edited by KimmoK on 25-Oct-2012 at 07:21 PM. Last edited by KimmoK on 25-Oct-2012 at 07:19 PM.
_________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
sundown
| |
Re: Dual/single DIMM memory bandwidth Posted on 25-Oct-2012 21:25:26
| | [ #14 ] |
|
|
|
Elite Member |
Joined: 30-Aug-2003 Posts: 5120
From: Right here... | | |
|
| @KimmoK & Deniil715
I have two 2GB memory sticks in my x1000, here is the test results with ragemem first & then ramspeed. Deniil715, you can compare with your single 2GB stick.
3.Sys_FC:> ragemem
RAGEMEM v0.37 - compiled 11/06/2010
CPU: P.A. Semi PWRficient PA6T-1682M B1 @ 1800 Mhz Caches Sizes: L1: 64 KB - L2: 2048 KB - L3: none Cache Line: 64
CPU MAX MIPS: 3083
L1 READ32: 6849 MB/Sec READ64: 13676 MB/Sec WRITE32: 6849 MB/Sec WRITE64: 13677 MB/Sec
L2 READ32: 3266 MB/Sec READ64: 5014 MB/Sec WRITE32: 2538 MB/Sec WRITE64: 4093 MB/Sec
RAM READ32: 2857 MB/Sec READ64: 3999 MB/Sec WRITE32: 2731 MB/Sec WRITE64: 3389 MB/Sec WRITE: 352 MB/Sec (Tricky)
VIDEO BUS READ: 14 MB/Sec WRITE: 161 MB/Sec
3.Sys_FC:> ramspeed -b 1 RAMspeed (UNIX) v2.3.0 by Rhett M. Hollander (Alasir Enterprises), 2002-04
4Gb per pass mode
INTEGER & WRITING 1 Kb block: 6826.67 Mb/s INTEGER & WRITING 2 Kb block: 6400.00 Mb/s INTEGER & WRITING 4 Kb block: 6606.45 Mb/s INTEGER & WRITING 8 Kb block: 6826.67 Mb/s INTEGER & WRITING 16 Kb block: 6826.67 Mb/s INTEGER & WRITING 32 Kb block: 6606.45 Mb/s INTEGER & WRITING 64 Kb block: 6826.67 Mb/s INTEGER & WRITING 128 Kb block: 2497.56 Mb/s INTEGER & WRITING 256 Kb block: 2497.56 Mb/s INTEGER & WRITING 512 Kb block: 2467.47 Mb/s INTEGER & WRITING 1024 Kb block: 2467.47 Mb/s INTEGER & WRITING 2048 Kb block: 2528.40 Mb/s INTEGER & WRITING 4096 Kb block: 2884.51 Mb/s INTEGER & WRITING 8192 Kb block: 2767.57 Mb/s INTEGER & WRITING 16384 Kb block: 2767.57 Mb/s
3.Sys_FC:> ramspeed -b 2 RAMspeed (UNIX) v2.3.0 by Rhett M. Hollander (Alasir Enterprises), 2002-04
4Gb per pass mode
INTEGER & READING 1 Kb block: 6606.45 Mb/s INTEGER & READING 2 Kb block: 6400.00 Mb/s INTEGER & READING 4 Kb block: 6606.45 Mb/s INTEGER & READING 8 Kb block: 6606.45 Mb/s INTEGER & READING 16 Kb block: 6826.67 Mb/s INTEGER & READING 32 Kb block: 6826.67 Mb/s INTEGER & READING 64 Kb block: 6606.45 Mb/s INTEGER & READING 128 Kb block: 3303.23 Mb/s INTEGER & READING 256 Kb block: 3250.79 Mb/s INTEGER & READING 512 Kb block: 3303.23 Mb/s INTEGER & READING 1024 Kb block: 3303.23 Mb/s INTEGER & READING 2048 Kb block: 3303.23 Mb/s INTEGER & READING 4096 Kb block: 3103.03 Mb/s INTEGER & READING 8192 Kb block: 2844.44 Mb/s INTEGER & READING 16384 Kb block: 2805.48 Mb/s
3.Sys_FC:> ramspeed -b 3 RAMspeed (UNIX) v2.3.0 by Rhett M. Hollander (Alasir Enterprises), 2002-04
4Gb per pass mode
INTEGER Copy: 2068.69 Mb/s INTEGER Scale: 2007.84 Mb/s INTEGER Add: 2507.76 Mb/s INTEGER Triad: 2372.20 Mb/s --- INTEGER AVERAGE: 2239.12 Mb/s
_________________ Hate tends to make you look stupid... |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 25-Oct-2012 23:25:22
| | [ #15 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @KimmoK
I think to get close to the theoretical speed or the memory you have to do DMA. Using the CPU alone won't cut it. Especially not using only one core. I would also guess that these memspeed testers aren't optimized for the twice as long pipeline that the PA6T has.
@sundown
Interesting. About the same as my readings. Are you DIMMs equal and put in the correct slots for interleaving? _________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
sundown
| |
Re: Dual/single DIMM memory bandwidth Posted on 25-Oct-2012 23:42:51
| | [ #16 ] |
|
|
|
Elite Member |
Joined: 30-Aug-2003 Posts: 5120
From: Right here... | | |
|
| @Deniil715
Quote:
Interesting. About the same as my readings. Are you DIMMs equal and put in the correct slots for interleaving? |
Yes they are, DIMM1 & DIMM3 are used, DIMM2 & DIMM4 is the other pair, but empty. Remember, OS4 can only see 2GB for now._________________ Hate tends to make you look stupid... |
|
Status: Offline |
|
|
Rose
| |
Re: Dual/single DIMM memory bandwidth Posted on 26-Oct-2012 6:45:02
| | [ #17 ] |
|
|
|
Cult Member |
Joined: 5-Nov-2009 Posts: 982
From: Unknown | | |
|
| @Deniil715
Quote:
Deniil715 wrote: @KimmoK
I think to get close to the theoretical speed or the memory you have to do DMA. Using the CPU alone won't cut it. Especially not using only one core. I would also guess that these memspeed testers aren't optimized for the twice as long pipeline that the PA6T has.
@sundown
Interesting. About the same as my readings. Are you DIMMs equal and put in the correct slots for interleaving? |
Performance boost from dual channel isn't as great as marketing people try to tell on any platform. Here's one example with graphs. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 26-Oct-2012 7:08:44
| | [ #18 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @Rose
Yes, the affect to CPU performance is smaller when task at hand is not heavily multithreaded and does not use RAM intensively.
But even for synthetic memory test it seems hard to max out the bandwidth in test. And clearly for CPU/RAM to videoram tests one needs to use some DMA tricks to get more than 10% of the bandwidth.
Current mainstream offerings have 1...4 buses to DDR3 and it is not any big surprice that only 2buses is the most common configuration. And for example all netbooks have single bus to RAM if I've not mistaken. _________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|
Deniil715
| |
Re: Dual/single DIMM memory bandwidth Posted on 26-Oct-2012 15:29:00
| | [ #19 ] |
|
|
|
Elite Member |
Joined: 14-May-2003 Posts: 4236
From: Sweden | | |
|
| @Rose
Must be the first time in history (I mean these recent years) that the CPU can't max out the memory bus. Von Neumann's bottleneck has finally become wider than the bottle itself
So, now the next interesting question arrises: Will it actually run faster if the L2 cache is disabled? I mean that has now become an unecessary step in the way it seems :) _________________ - Don't get fooled by my avatar, I'm not like that (anymore, mostly... maybe only sometimes) > Amiga Classic and OS4 developer for OnyxSoft. |
|
Status: Offline |
|
|
KimmoK
| |
Re: Dual/single DIMM memory bandwidth Posted on 26-Oct-2012 18:07:59
| | [ #20 ] |
|
|
|
Elite Member |
Joined: 14-Mar-2003 Posts: 5211
From: Ylikiiminki, Finland | | |
|
| @Deniil715
You forgot those latencies. L2 is sooner available to use than the system RAM. So, if you disable L2 you will notice a (small) drop of performance.
It seems that 667Mhz SAM is some 20% slower than similar HW with L2, even though memory is pretty fast on SAMs.
PA6T has fast RAM and large L1, so the performance hit might not be 20% ... perhaps 10-15%. And it depends on application. Last edited by KimmoK on 26-Oct-2012 at 06:11 PM.
_________________ - KimmoK // For freedom, for honor, for AMIGA // // Thing that I should find more time for: CC64 - 64bit Community Computer? |
|
Status: Offline |
|
|