Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

11 crawler(s) on-line.

39 guest(s) on-line.

1 member(s) on-line.

Hypex

You are an anonymous user.
Register Now!

Hypex: 4 mins ago

AndreasM: 34 mins ago

zErec: 34 mins ago

amigakit: 41 mins ago

matthey: 54 mins ago

sibbi: 57 mins ago

_ThEcRoW: 1 hr 40 mins ago

amigagr: 2 hrs 4 mins ago

zipper: 2 hrs 38 mins ago

Templario: 2 hrs 53 mins ago

Forum Index

Amiga News & Events

MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)

Poster

Thread

BigGun

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 7:16:17

[ #201 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

@Hammer

I wonder what is wrong with you?

Let me ask you a question, do you ride a motorbike?

I do, to be precise I've got a chopper.
I know that all those japanese super sports will run faster than a chopper does.
I know as well, that a Harley is actually a very expensive bike.
Its much more expensive then most japanese sport bikes.
But what I also know that the feeling of riding a chopper is for me what I want.
I do not want to ride a super sport even if its cheaper and faster.

I think with the people liking Amiga OS its the same situation.
We want to run AMIGA OS. That you can get a cheap x86 laptop pre-installed with windows is not giving us much satisfaction.

Now what you are doing is coming to this AMIGA forum and jerking of with the details of your Windows box.

Hammer,
have you anything to give to the AMIGA community?
Did you write any games?
Did you port any applications?
Or are you just a trying to smart ass in an Amiga forum?

I assume you know that all people here are folks that want to run Amiga OS.
Amiga OS 3 runs on 68k period.
Amiga OS 4, and MOS runs on PPC.

The Natami is a very valuable idea to create an fast original Amiga, able to run the original OS.
Do you have any constructive proposal how to improve it?
Or are you an active AROS developer wanting to inform us about AORS on x86?

If you have nothing productive to say it would be better if you would be quiet.
Hammer, its obvious that you don't fully understand the technical background of the topics to which you are posting here. If you would understand them then you would not argue that silly around.

If you have questions or serious proposals then please post them.
But PLEASE stop trolling.

Gunnar

Last edited by BigGun on 23-Jan-2008 at 07:23 AM.

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

Hammer

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 9:06:32

[ #202 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@BigGun

Quote:
But PLEASE stop trolling.

I wonder who's trolling when Core 2 has 64byte cache lines NOT the claimed 128 byte cache lines.

If you are going to claim something make sure it's based on results.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Pleng

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 9:21:21

[ #203 ]

Regular Member

Joined: 17-Nov-2005
Posts: 458
From: Unknown

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@Donar

Well in your comment you stated “You then can only get it used or via .torrent” and you quoted me, as I already have OS4, I’m not going to go-around pirating software unless some one like Amiga Inc comes takes it from me and if I can’t buy it from computer store or from Hyperion / ACube-Systems.

What a ridiculous comment, and one I see all too often on this forum. I'm sure when doner said the only way "you" will be able to get OS4, he in fact, meant the only way "one" will be able to get OS4. As at the end of the day, when we're talking about new hardware we're talking about EXPANDING the community. Allowing new people to buy systems, and exisiting members to upgrade. Remember "one" may only have a 68k Amiga right now with little chance of upgrading at the moment.

And besides which, at some point your OS4 compatible hardware will either fail you, become too slow for you, or will run out of software. People developing for OS4 will eventually move away from the platform and eventually there will be some web browser, or email client, or office suite feature that you simply can't live without.

Of course, should enough developers want one, the Natami could of course help you out here. OK so you won't get any PPC Native apps but of course the 68k Apps should run just fine for you (apart from all those hardware-banging AGA++ demos of course... lol!)

Status: Offline

BigGun

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 9:32:32

[ #204 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

@Hammer

Quote:

Hammer wrote:
@BigGun

Quote:
But PLEASE stop trolling.

I wonder who's trolling when Core 2 has 64byte cache lines NOT the claimed 128 byte cache lines.

If you are going to claim something make sure it's based on results.

My original quote was:

Quote:

The normal way of trying to hide the latency is, as you correctly pointed out prefetching.
This is the reason why many CPU have huge cache lines sizes of 128 byte or more.

I never claimed that an Core Duo has a certain cache line size!
I merely pointed out that its common strategy to hide latency by increasing cache line size.

Everybody knows that:
A 68K and Coldfire CPU has a cache line size of 16 Byte.
A PPC G2/G3/G4 have 32 byte
A PA-Semi has 64 byte
Most POWER and the G5 have 128 byte.
Some POWER even have 512 byte cache lines size (3level cache)

Another fact I was pointing out was that there are cases where short cache line is of advantage.

It would be nice if you could stop misquote people.

Thanks in advance.

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

wolfe

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 9:58:17

[ #205 ]

Super Member

Joined: 18-Aug-2003
Posts: 1283
From: Under The Moon - Howling in the Blue Grass

@BigGun

So, lets speed up development, as " I " want one NOW. My A1200 is dying by the second . .

Dreaming on:

Put Natami on a very small mobo so it can be made portable . . .

Dreaming off:

@all

If a Pentium is your future, go for it. But I want something different than a winblows box . .

_________________
Avatar babe - Monica Bellucci.

Status: Offline

Hammer

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 10:14:40

[ #206 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@BigGun

Quote:

So you are saying that the Natami with 10ns is about three times faster than the nVidia's nForce

Intel 965 chipset has JIT scheduling, command overlap, out-of-order scheduling and opportunistic writes.

Command Overlap allows for the insertion of the DRAM commands between the
Activate, Precharge, and Read/Write commands normally used, as long as the inserted commands do not affect the currently executing command. This allows for situations where multiple commands can be issued in an overlapping manner.

The out-of-order scheduling, allows for reorder requests.

In Opportunistic Writes, Processor requests for memory reads usually are weighted more heavily than writes to memory to avoid cases of starving the processor of data to process while the writes are issued to system memory. In previous generations of Intel chipsets, writes were issued to a pending queue to be flushed to memory when certain watermarks were reached. During this write flush, if the processor needed data in system memory, it would have to wait for the write flush to finish, starving the processor of data to process. To avoid this, the Intel 965 Express chipset family monitors system memory requests and issues pending write requests to memory at times when they will not impact memory read requests, allowing for an almost continuous flow of data to the processor for processing.

Quote:

This is the reason why many CPU have huge cache lines sizes of 128 byte or more.

Intel's Core 2 has 64byte cache line for both L1 and L2 and it's the dominate PC processor.

Quote:

The point was that if you have code that sets single pixels on the Natami
But prefetching does not always help. E.g If you are pixeling a Voxel screen then you draw the screen in colums, prefetching a whole cache will not help but hurt performance majorly.

Prefetching helps with pixel shaders (via GPU's L1/L2 cache) and texture units (via GPU's texture cache). In G80, it has texture pre-fetch and cache to feed the texture units.

NV’s GigaThreads and ATI’s HyperThreads are another latency hiding techniques.

Drawing pixels are the main “bread and butter” for SIMD/MIMD/VLIW/Scalar streaming co-processor array.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

BigGun

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 10:23:02

[ #207 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

Some people mentioned UAE, as basis for 68k emulation.

There is one fact that I would like to point out.
As most of you will certainly know the 68k emulation of Win-UAE needs in average 20-50 instructions to emulate ONE SINGLE 68k instruction!
And these are only for emulating the instruction (e.g in JIT mode)
In addition to this you will first have big amount of work to recognize the 68k instruction.

As you all know the 50Mhz 68060 has a maximum throughput of 100 million 68k instructions per second. This means that the JIT UAE on x86 will need 2,500 - 5,000 million instructions to emulate the 060.

A V4 Coldfire ($30) has a maximum throughput of 530 million Coldfire/68k instructions per second.
A JIT UAE on x86 will need 13,000 - 27,000 million instructions to emulate the coldfire.
In other words: the Coldfire is MUCH faster in executing 68k-coldfire code than any existing x86 CPU can do it in emulation!

Yes we know a Coldfire will not be able to run old 68k code at full speed as the coldfire will need to emulate some instruction himself.
But for much still developed AMIGA apps its no problem to compile them for Coldfire.
For reference compiling my demo game 194x took me only halve an hours.

The Coldfire will run such a coldfire-68k binary much faster than UEA can do.
If you think that your Win-UAE is fast then you will be surprised how fast and low noise a Codlfire with good memory can be.

Cheers

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

BigGun

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 10:26:43

[ #208 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

@Hammer

> Lots of INTEL fan boy bla bla

Hammer,

- Can you run OS3 or OS4 on your Core Duo?
- If not, do you plan to port Amiga OS to it?

- Do you have CyberGFX or Picasso driver for you c86 GFX chip?
- If not, will you write them for it?

If you can not run AMIGA OS on your x86 hardware then why are you writing about it in an Amiga-forum?

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

Hammer

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 10:34:36

[ #209 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@BigGun

Quote:

(SNIP for space
)
I never claimed that an Core Duo has a certain cache line size!
I merely pointed out that its common strategy to hide latency by increasing cache line size.

My statement was made in the context of X86 (~210 million unit per year) and PPC/POWER (~70 million units per year).

With Core 2 being the dominate processor in "fat" front-end CPU, would the statement ‘common strategy to hide latency by increasing cache line size and 128 byte cache line size be true?

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hammer

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 10:51:32

[ #210 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@BigGun

Quote:

BigGun wrote:
@Hammer

> Lots of INTEL fan boy bla bla

Let's me see... 100Mhz with 10ns...

What's important is benchmarks and results.

Quote:

Hammer,

- Can you run OS3 or OS4 on your Core Duo?

AOS3 can be made to run on X86 via JIT emulation and there's AROS.

Quote:

- If not, do you plan to port Amiga OS to it?

My statements have nothing to do with the OS.

Quote:

- Do you have CyberGFX or Picasso driver for you c86 GFX chip?

CPU centric is not quite classic Amiga i.e. let the co-processors do the heavy work.

Quote:

- If not, will you write them for it?

If you can not run AMIGA OS on your x86 hardware then why are you writing about it in an Amiga-forum?

Erm... recall Amikit/WinUAE/QuarkTex...

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hammer

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 11:00:36

[ #211 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5616
From: Australia

@BigGun

Quote:

BigGun wrote:
Some people mentioned UAE, as basis for 68k emulation.

There is one fact that I would like to point out.
As most of you will certainly know the 68k emulation of Win-UAE needs in average 20-50 instructions to emulate ONE SINGLE 68k instruction!
And these are only for emulating the instruction (e.g in JIT mode)
In addition to this you will first have big amount of work to recognize the 68k instruction.

As you all know the 50Mhz 68060 has a maximum throughput of 100 million 68k instructions per second. This means that the JIT UAE on x86 will need 2,500 - 5,000 million instructions to emulate the 060.

A V4 Coldfire ($30) has a maximum throughput of 530 million Coldfire/68k instructions per second.
A JIT UAE on x86 will need 13,000 - 27,000 million instructions to emulate the coldfire.
In other words: the Coldfire is MUCH faster in executing 68k-coldfire code than any existing x86 CPU can do it in emulation!

Yes we know a Coldfire will not be able to run old 68k code at full speed as the coldfire will need to emulate some instruction himself.
But for much still developed AMIGA apps its no problem to compile them for Coldfire.
For reference compiling my demo game 194x took me only halve an hours.

The Coldfire will run such a coldfire-68k binary much faster than UEA can do.
If you think that your Win-UAE is fast then you will be surprised how fast and low noise a Codlfire with good memory can be.

Cheers

What's the AOS3 Quake (software render) FPS?

Other AOS3 benchmarks are; Cinema 4D raytrace, LAME MP3 encode/decode, DIVX-HD encode/decode and 'etc'.

Last edited by Hammer on 23-Jan-2008 at 11:08 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Donar

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 12:36:09

[ #212 ]

Regular Member

Joined: 12-Nov-2006
Posts: 117
From: Germany

@NutsAboutAmiga

Quote:
Well in your comment you stated “You then can only get it used or via .torrent” and you quoted me, as I already have OS4, I’m not going to go-around pirating software unless some one like Amiga Inc comes takes it from me....
I now actually think you do not want to understand what people say.

Quote:
...and if I can’t buy it from computer store or from Hyperion / ACube-Systems.
I actually think that part was covered by me when i wrote:
Sorry, but if AInc wins they will bury the sourcecode and stop the sales of AOS 4.0.
Again see above...

Last edited by Donar on 23-Jan-2008 at 12:37 PM.

_________________
<- Amiga 1260 / CD ->
Looking for:
A1200/CF CFV4/@200,256MB,eAGA,SATA,120GB,AROS

Status: Offline

ChaosLord

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 12:36:45

[ #213 ]

Cult Member

Joined: 4-Apr-2005
Posts: 782
From: Houston, Texas USA

@Hammer

Intentionally misquoting people is lame.

How am I supposed to believe anything you say about complicated obtuse technical issues when you can't even understand plain simple english sentences written by BigGun?

_________________
Wanna try a wonderfull magical Amiga strategy game?
Total Chaos AGA

Status: Offline

Donar

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 13:15:27

[ #214 ]

Regular Member

Joined: 12-Nov-2006
Posts: 117
From: Germany

@BigGun

Quote:
Yes we know a Coldfire will not be able to run old 68k code at full speed as the coldfire will need to emulate some instruction himself.

Any idea how to deal with instructions that are implemented differently on 68k and Coldfire? At least Thomas seems to be thinking about using OS3.9 on this machine, so this could be a problem. Even if everything starts up ok, there could be loops that will never end, or your applications could give false results ...

A solution could be to create a Coldfire native "Core" (AR)OS that starts an 68k emulator for non Coldfire native code - at least until a full blown 68k/CF AROS version is available. I take the opportunity reminding developers that there are 1111$ to grab for bringing the 68k branch of AROS out of unmaintained, which could be compiled for CF also .

Quote:
...you will be surprised how fast and low noise a Codlfire with good memory can be.
I actually like the idea of using a coldfire as it is cheap and relatively fast, in terms of a 68060 it flies... And hopefully this solution would be in a price range where i could consider getting it "for fun"contrary to the CELL + XDR RAM solution.

Last edited by Donar on 23-Jan-2008 at 01:37 PM.

_________________
<- Amiga 1260 / CD ->
Looking for:
A1200/CF CFV4/@200,256MB,eAGA,SATA,120GB,AROS

Status: Offline

Kronos

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 14:21:44

[ #215 ]

Elite Member

Joined: 8-Mar-2003
Posts: 2615
From: Unknown

@Hammer

Quote:

Hammer wrote:
@BigGun

Quote:

What in "as long as you DO NOT USE TOO MANY bitplanes" do you not understand?
Everybody knows that you could use for free 4 planes in lowres or 2 plane if hires.
Are you just arguing for the sake of it?

I don’t recall running my Amiga500 like a monochrome Mac.

Who was talking bout "monochrome" ???

The Amiga-Chipset (OCS/ECS) is capable a running 2 bitplanes HiRes without having to steal extra cycles from the CPU, use more and you will feel the slowdown (especially on a system without real Fast-MEM).

Guess why all through 1.0 to 2.1 4 colors were preseted .....

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

Status: Offline

umisef

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 14:37:35

[ #216 ]

Super Member

Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@BigGun

Quote:
There is one fact that I would like to point out.

Actually, let me point out a few. Starting with your "fact", though....

Quote:
As most of you will certainly know the 68k emulation of Win-UAE needs in average 20-50 instructions to emulate ONE SINGLE 68k instruction!
And these are only for emulating the instruction (e.g in JIT mode)

"Most of [us]" may know that, but then most of us would be dead wrong, by about an order of magnitude. As should be blindingly obvious to anyone who has ever actually looked at the benchmarks.
As you point out yourself, the *maximum* theoretical throughput for non-branch instructions of the 68060 is two per clock cycle. Realistically, let's say you get one in real life. That would then require 1 to 2.5 blllion x86 instructions just to match the 060/50, if your numbers were right. Which is pretty much the range you could get from a 1.3GHz Athlon XP back in 2001. But looking at the benchmarks, I find that the 060/50 gets outperformed by around a factor of 10, more in some cases, less in others.
Which *I* personally don't find particularly surprising, seeing as I wrote the damn JIT and have a pretty good idea of the number of x86 instructions per 68k instruction. It *is* a bit disappointing to see you make such silly inaccurate statements, when the code is there for you to look at.

Quote:
The Natami has even lower latency than the AMD "on chip cache" !

Just one example of your "10ns" crusade, which I am sure Thomas would be horrified by.

First of all, 10ns is *not* the turnaround time from the CPU wanting data to the CPU getting the data. It is the time from the moment the address lines are stable at the memory chip, to the time that the chip's response is stable at the memory chip. You need to add signal travel times (a nanosecond is only 30cm at the speed of light in a vacuum, considerably less at the speed of electrical signals in metals), as well as signal rise and fall times (which are due to unavoidable capacitive and inductive loads generated during the travel).

Next, that "chip mem" is, by its very nature, shared. Which means it's not just the CPU which can drive the address lines, and not just the CPU which is hooked up to the data lines. Heck, it's not even only the CPU that can drive the control signals. So now you have multiplexers and/or tri-state buffers sitting around, adding gate delays, adding capacitive load. And half the time, the CPU has to wait a full memory cycle (i.e. 10ns+++) before it is allowed to drive anything, because the "chips" are using it half the time.

Which means your average latency is now probably somewhere in the 30-40ns range. Which is still pretty good, but far from spectactular. Of course, that is the latency between the moment the CPU actually works out that it needs to access external memory (which in itself requires quite some time for MMU translations and checking the internal caches), and the time data arrives at the CPU. Actually *doing* something with it (like, say, chasing pointers) will require additional time to use the data, store it in a register, get the next instruction from the instruction decoder, and so on. At the end of all that. I'd be very surprised if the Natami can chase pointers in its "chipmem" any faster than the machine I am typing this on can do it in its bog standard DDR2 main memory.

Of course, making essentially random accesses into memory is where the (necessarily uncached) chipmem looks best. Imagine running code from it --- you'd get the full latency for fetching each and every instruction. *shudder*

But to come back to your claim:

Quote:
The Natami has even lower latency than the AMD "on chip cache" !

When was the last time you looked at an AMD?

One core on the machine I am typing on can (and does) chase 132 million pointers per second (arranged in a cache-killing butterfly pattern) in L2. That's a *proven* latency, everything included, of less than 8ns. Compare that to even your rather naive idea of what "10ns SRAM" means, and you'll find that even your rather generous view of Natami has it losing out to the L2 of the AMD. Which isn't all that surprising, either, seeing as you pointed out yourself that discrete SRAM chips at some point were used for L3 caches --- it would be odd to make yesterday's L3 faster than today's L2.

Quote:
But prefetching does not always help. E.g If you are pixeling a Voxel screen then you draw the screen in colums, prefetching a whole cache will not help but hurt performance majorly.

If you are "pixeling" directly to video memory, then obviously caches don't come into it. And again, on the machine I am typing this on, I can do 65 million individual writes per second into an 8M area of the consumer-grade $100 8600GT graphics card. And these are 65 million separate bus transactions (as the addresses are nowhere near consecutive, and thus the write combiner couldn't trigger even if it was enabled, which it wasn't), actually measured. PCI-e is a point-to-point "bus" (quotation marks because the whole point is that it is *not* a bus, but rather a point-to-point connection architecture), with dedicated lanes for both up- and downstream traffic. In such an architecture, buffers can be, and are, ubiquitous.

If instead you are "pixeling" to main memory, prefetching most certainly *does* help. Yes, you need to pipeline things a bit to get optimum performance, but interleaving a bit of calculation between the prefetch and the write will allow you to completely hide the latency; There is a reason for being able to have something like half a dozen prefetches in flight at any given time. Believe me, the thing that puts food on my table makes extensive use of just that technology.
And that's assuming you'd *want* to use the cache. The better approach would probably to make the writes so-called non-temporal writes, which completely bypass the cache, and thus don't require a cacheline-sized read for each write.
Of course, the *actual* way to do this would be to render the whole thing "lying on its side", which makes it trivial to prefetch all of the next "row's" cachelines while working on the current one, and then to have a clever copy routine which transfers cache-friendly tiles to screen in a write-combine-friendly way, rotating them in the process.

Quote:
The point was that if you have code that sets single pixels on the Natami.
Like a Star Routine, Texturemapper, or Voxel routine
then the low latency of the Natami is a major advantage over any other GFX card.

65 million writes per second, actually measured. Compared to what for the Natami?

Quote:
If you think that your Win-UAE is fast then you will be surprised how fast and low noise a Codlfire with good memory can be.

I think this sums up your side of this thread perfectly. You compare an "is" to a "can be". The fact that you misrepresent the "is", and that the "can be" only exists in your naive view of what numbers on spec sheets mean, is just damning your arguments further. But even without it, the comparison of "is" to "can be" would be extremely telling.

Last edited by umisef on 23-Jan-2008 at 03:02 PM.
Last edited by umisef on 23-Jan-2008 at 03:02 PM.

Status: Offline

BigGun

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 17:07:01

[ #217 ]

Regular Member

Joined: 9-Aug-2005
Posts: 438
From: Germany (Black Forest)

@umisef

Quote:

umisef wrote:
@BigGun

Quote:
There is one fact that I would like to point out.

Actually, let me point out a few. Starting with your "fact", though....

Quote:
As most of you will certainly know the 68k emulation of Win-UAE needs in average 20-50 instructions to emulate ONE SINGLE 68k instruction!
And these are only for emulating the instruction (e.g in JIT mode)

"Most of [us]" may know that, but then most of us would be dead wrong, by about an order of magnitude. As should be blindingly obvious to anyone who has ever actually looked at the benchmarks.

[/quote]

Okay, last time I looked at UAE sources there where around 30-50 instructions
used per function that emulates one 68k instruction. I think the code in question was in "cpuemu.c"
Can you explain this?

Quote:

As you point out yourself, the *maximum* theoretical throughput for non-branch instructions of the 68060 is two per clock cycle. Realistically, let's say you get one in real life. That would then require 1 to 2.5 blllion x86 instructions just to match the 060/50, if your numbers were right. Which is pretty much the range you could get from a 1.3GHz Athlon XP back in 2001. But looking at the benchmarks, I find that the 060/50 gets outperformed by around a factor of 10, more in some cases, less in others

Maybe the reason why the 060 in your orig was slower is that the 060 has small cache and relative slow memory compared to your emulating system which has a huge cache?

Quote:

Which *I* personally don't find particularly surprising, seeing as I wrote the damn JIT and have a pretty good idea of the number of x86 instructions per 68k instruction. It *is* a bit disappointing to see you make such silly inaccurate statements, when the code is there for you to look at.

Please see above. Maybe you can explain what I saw rather then calling it just silly.
The UAE code that I looked at creates huge blocks per 68k instruction.

Quote:
The Natami has even lower latency than the AMD "on chip cache" !

Just one example of your "10ns" crusade, which I am sure Thomas would be horrified by.

First of all, 10ns is *not* the turnaround time from the CPU wanting data to the CPU getting the data.
It is the time from the moment the address lines are stable at the memory chip, to the time that the chip's response is stable at the memory chip.
[/quote]

Sorry but your claim assumes that we use 10ns SRAM. Natami uses SRAM faster than 10ns.

Quote:

Next, that "chip mem" is, by its very nature, shared. Which means it's not just the CPU which can drive the address lines, and not just the CPU which is hooked up to the data lines. Heck, it's not even only the CPU that can drive the control signals.

What you say is right based on your knowledge on the Natami.
But the actual design is not how the Natami works. There are more than one memory bank and one bank is CPU only SRAM, so there are cases the CPU has full access to its SRAM bank.
But I think it would be better to wait for the update of Thomas website to show you this.

Quote:

Of course, making essentially random accesses into memory is where the (necessarily uncached) chipmem looks best. Imagine running code from it --- you'd get the full latency for fetching each and every instruction. *shudder*

see above

Quote:

But to come back to your claim:
Quote:
The Natami has even lower latency than the AMD "on chip cache" !

When was the last time you looked at an AMD?

One core on the machine I am typing on can (and does) chase 132 million pointers per second (arranged in a cache-killing butterfly pattern) in L2. That's a *proven* latency, everything included, of less than 8ns. Compare that to even your rather naive idea of what "10ns SRAM" means,

The Natami does not use 10ns SRAM but faster .
The aimed goal was to drive the bus access with 10ns.
Goal of the blitter is to do 100% random texture fetch in the SRAM in 10ns - in a much huger amount of memory than your AMD CPU has.

The SRAM is increaable fast if you look at it from the 68k CPU point of which.
An 68060 could of course do one random bus access in every bus clock.
But your AMD can NOT access his 2nd level cache every CPU clock.
You can probably tell me how many clocks delay it has?

[quote ]it would be odd to make yesterday's L3 faster than today's L2.[/quote]

You are missing an key factor here.
Cache in a CPU is not memory.
There is no direct access of the CPU cache for the CPU but there is address comparision involved.
In a multi core CPU you will even have cache syncronisation in addition to this.

Direct acces to SRAM saves all this.
This is why the CELL uses SRAM for the SPUs is. As the SRAM its 10times faster than cache would be.

Quote:
But prefetching does not always help. E.g If you are pixeling a Voxel screen then you draw the screen in colums, prefetching a whole cache will not help but hurt performance majorly.

If you are "pixeling" directly to video memory, then obviously caches don't come into it. And again, on the machine I am typing this on, I can do 65 million individual writes per second into an 8M area of the consumer-grade $100 8600GT graphics card.
[/quote]
How do you measure this?
Were you randomly writing only or where you randomly reading too?

Quote:

And that's assuming you'd *want* to use the cache. The better approach would probably to make the writes so-called non-temporal writes, which completely bypass the cache, and thus don't require a cacheline-sized read for each write.

So how do you do this on 68k and PPC?

Quote:

I think this sums up your side of this thread perfectly. You compare an "is" to a "can be". The fact that you misrepresent the "is", and that the "can be" only exists in your naive view of what numbers on spec sheets mean, is just damning your arguments further. But even without it, the comparison of "is" to "can be" would be extremely telling.

For that you don't know the design but only interprete things into it you are very aggressive speaking here.

The Natami is in my humble opinion the so far best approach to create an "original AMIGA" which is HW and Software compatible to the original ones but as fast as you can get it without having the industrial manufacturing resources of INTEL and NVIDIA combined.

If you have idea to improve the design feel free to speak up.
As I said from the beginning: You can not compare a self developed piece of hardware with some GFX card or CPU which companies are putting millions into developing.
But as these CPUs don't run AmigaOS and these GFX chips do not have Amiga drivers comapring them makes little sense.
I pointed out that the Natami pulls some tricks to get performance.
I clearly said that the Natami will not beat you standard hardware in all benchmarks.
But there will by caces were the Natami will be faster than other PC systems.
I think you agree with me in this?

Gunnar

_________________
APOLLO the new 68K : www.apollo-core.com

Status: Offline

NutsAboutAmiga

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 19:40:30

[ #218 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12857
From: Norway

@BigGun

Quote:
the JIT UAE on x86 will need 2,500 - 5,000 million instructions to emulate the 060.

But on PowerPC you should need less (no big endien and little endien problems), and whit out emulating the chipset in software, the speed should be quite fast, and if you’re able to optimize OS system code, and make drivers the speed should even increase more, now lets say you where able to get AmigaOS4 running on it, then every thing should be running PowerPC native code, and 680x0 code runs on JIT, 680x0 so speed should be almost the same as native PowerPC programs.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

TheDaddy

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 20:01:31

[ #219 ]

Elite Member

Joined: 30-Sep-2005
Posts: 4499
From: Quattro Stelle

@ALL

Please let's stop the technical crap. This thread is about a new Amiga NOT an emulated one, this is about a great, original and commendable project not about a PC running an emulator.

This could be a new Amiga, a real one.

If you want to run OS3.something emulated on a pc you are free to do so.

I have been using a pc running emulated Workbench and I can assure you that although it does a decent job and it's faster than any Amiga it still doesn't feel like a real Amiga, not to count the dozens of freezings and crashes.

So let's go back to Natami and support Thomas instead of giving it a reason NOT to carry on.

Happy with your pc? Fine, We talk Amiga here.

_________________
www.loriano.pwp.blueyonder.co.uk

Status: Offline

mike

Re: MeKa 2008 (Amiga Party) (SHOWN WAS NEW AMIGA HW!)
Posted on 23-Jan-2008 21:19:20

[ #220 ]

Regular Member

Joined: 31-Jul-2007
Posts: 406
From: Alpha Centauri

@TheDaddy

Quote:

Happy with your pc? Fine, We talk Amiga here.

I agree.

Man, i find myself reading trough this thread and wondering how the hell you come up with some of this. Even i got hot headed when reading this, i cant imagine how Biggun Gunnar's managed to keep his cool.. I certainly wouldnt...

If you all want emulation, look up e-uae, that's the best emulation out there, but your probably all windows users anyway so go have sex with winuae bill gates and mr.ballmer for a while and come back when your not satisfied with that anymore...

Moving the Amiga over to the coldfire will revolutionize the entire Amiga scene, having a cell cpu as a mistress on the side is all we could ever dream about. Do you see any other community (except the ps3 which doesnt count because they cant use it for anything other then games) with a Cell cpu?

This will be great when it arrives, and as Gunanr said, any suggestions are welcome, but pushing another developer so far that he/they leaves/abandons the whole amiga scene, again, would probably cause a riot..

Last edited by mike on 23-Jan-2008 at 09:23 PM.
Last edited by mike on 23-Jan-2008 at 09:22 PM.

_________________
C= Amiga addict
,,,
(Oo)
⎛☮ໄ
ﮑὠՀ
Couldn't care less what other people think, seeing that there's concrete evidence they don't.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle