Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
Home
Features
News
Forums
Classifieds
Links
Downloads
Extras
OS4 Zone
IRC Network
AmigaWorld Radio
Newsfeed
Top Members
Amiga Dealers
Information
About Us
FAQs
Advertise
Polls
Terms of Service
Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
28 crawler(s) on-line.
 53 guest(s) on-line.
 1 member(s) on-line.


 NutsAboutAmiga

You are an anonymous user.
Register Now!
 NutsAboutAmiga:  2 mins ago
 BigD:  14 mins ago
 matthey:  57 mins ago
 Tuxedo:  1 hr 13 mins ago
 Karlos:  1 hr 14 mins ago
 amigakit:  1 hr 45 mins ago
 tekmage:  2 hrs 23 mins ago
 Rob:  2 hrs 38 mins ago
 zipper:  2 hrs 43 mins ago
 AF-Domains.net:  3 hrs 20 mins ago

/  Forum Index
   /  Amiga OS4 Hardware
      /  32-bit PPC on FPGA
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 Next Page )
PosterThread
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 18:30:32
#81 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Kronos

FPGA are a development vehicle..
Every company on earth uses them for this purpose.

IBM develops their new POWER chips in FPGA.
INTEL uses FPGA to cverify and test their new I-cores.
ARM does the same, AMD does the same....


Like the belly of a wife can grow a baby.
You can develop a new CPU using an FPGA system.
Of course the goal of a baby is NOT to live is 80 years life in the belly of the mother ....

Of course when a design is fully finished - you can give birth and make then 10 times faster ASIC

I'm sure you understand this...

The APOLLO 68080 development is done exactly how IBM would develop a new Core.
Yes the development of a Complex CISC takes long and the verification very long.
But when its fully done - you have the blueprints for making a new high end Gigaherz 68K

Last edited by Gunnar on 11-Feb-2024 at 06:33 PM.

 Status: Offline
Profile     Report this post  
Kronos 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 18:34:13
#82 ]
Elite Member
Joined: 8-Mar-2003
Posts: 2559
From: Unknown

@Gunnar

Quote:

Gunnar wrote:

Of course when a design is fully finished - you can give birth and make then 10 times faster ASIC




So start s###ing or get of the toilet

I'd say the chance of a 68080 ASIC ever being a thing is minuscule.

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 18:37:17
#83 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Kronos


Quote:
I'd say the chance of a 68080 ASIC ever being a thing is minuscule.


People told me before what we do is even totally impossible.
People told me its impossible to develop a CPU better than 68020 ..
And look what I did.

People said its impossible to revive Amiga..
And still we sold over 10,000 Vampires.
Is this not already reviving?



We are Amiga fans.
Our goal is to revive Amiga.
You can believe in us or not - this makes no difference.

We do what we think is right.

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 18:48:40
#84 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@matthey

Quote:
PPC has a small advantage when executing code on data in registers due to having more registers but the 68k has a large advantage when executing code accessing data in caches/memory.



I admit, I cheat in the Apollo 68080 CPU ...

There is a lot also good in PowerPC and everything where the PPC is good - I also added to the 68080 but even better.

Yes the PPC has 32 integer register.
The APOLLO 68080 has 50% more = it has 48 integer register.
The PPC can do 3 operand instructions = The Apollo 68080 can do any instruction in 2 and also 3 operand.
There is a lot also good in PowerPC ... simple adding all the good to the 68K makes the 68k even stronger.


 Status: Offline
Profile     Report this post  
Kronos 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 18:58:00
#85 ]
Elite Member
Joined: 8-Mar-2003
Posts: 2559
From: Unknown

@Gunnar

Quote:

Gunnar wrote:


People said its impossible to revive Amiga..
And still we sold over 10,000 Vampires.


1% done 99 still to go.

ESCOM might have had a chance to "revive" the Amiga if they had much more capital and way better ideas.
Everything later was just far to little far to late.

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 19:10:47
#86 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Kronos

Quote:
1% done 99 still to go.


 Status: Offline
Profile     Report this post  
Kronos 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 19:31:04
#87 ]
Elite Member
Joined: 8-Mar-2003
Posts: 2559
From: Unknown

@Gunnar

How cute.


But please define a "living" Amiga platform?

Plenty people in various real retro communities, does that make the C64, AppleII, Atari(8bit,consoles,ST) or the Archimdes/RiscPC alive?

And if yes why does the Amiga need reviving?

The DTV64, Mini(S)NES and others sold far more units did they revive the platforms?

So you have 10k users but very few [Ballmer mode] developers developers developers [/Balmer mode] for which you not only compete with 68000+OCS diehards, NG or EMU lovers but also with other fake retro projects like the Commander16 or Mega65.

What have we seen sofar? A few halo project games that would have been better suited on the homebrew channels for Wii or Dreamcast and some stuff that isn't really Vampire specific.

So no you are not going to revive the Amiga. Maybe you slow the decline a bit but more likely you just add another spilt to the already fractured "Amiga".

_________________
- We don't need good ideas, we haven't run out on bad ones yet
- blame Canada

 Status: Offline
Profile     Report this post  
kolla 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 20:40:38
#88 ]
Elite Member
Joined: 21-Aug-2003
Posts: 2876
From: Trondheim, Norway

How can one revive Amiga with a "black-box" hardware design?

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 11-Feb-2024 21:11:40
#89 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4398
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Kronos

How many users does MorphOS have and among those how many are developers?

What's the minimum ratio that you consider healthy?

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Hammer 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 6:39:31
#90 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5264
From: Australia

@Karlos

https://github.com/antonblanchard/microwatt
Open POWER ISA softcore written in VHDL 2008. Microwatt POWER softcore fits in a Lattice ECP5 FPGA with 85,000 LUTs.

An evaluation board for the medium-to-small ECP5 FPGA is about $200 AUD

https://au.mouser.com/ProductDetail/Lattice/LFE5UM5G-85F-EVN?qs=w%2Fv1CP2dgqoyj9CgAS78aw%3D%3D
ECP5 Evaluation Board (LFE5UM5G-85F-EVN) is about $224.20 AUD.

LFE5UM5G-85F-EVN can be expanded via Raspberry Pi GPIO.

This ECP5 FPGA needs Fast RAM and a graphics core (RTG).

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
umisef 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 13:17:19
#91 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Karlos

Quote:
Does the instruction fetch count here?


That goes to what it actually means to "execute the instruction in one cycle". And the only meaning in which that could possibly be true for this instruction is throughput.

But for throughput, pulling 10 bytes out of the instruction fetch buffer for a single instruction is certainly not something the actual 68k CPUs from Motorola can sustain for any amount of time.

What the so-called "68080" does, of course, is a mystery to me, because to the best of my knowledge, there is no "68080 user manual" describing its behaviour.

 Status: Offline
Profile     Report this post  
umisef 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 13:30:02
#92 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Gunnar

Quote:
The example with the instruction ADD.l #10000,myscore in one instruction is a real world result.


Did you just sneakily try to change your example to one with a 16 bit immediate?

And let's be real --- the only reason the 68080 can do that (if it can), is because it is running at 1995 clock rates, but on hardware containing modern day memory blocks.

Sure, having large amounts of fast (relatively to your ALU's clock speed) memory available for free is great. But it's not something that is true for any of the PPC implementations you like to compare against --- because the ALUs in those are designed to run at considerably higher clock speeds.

If one were to implement the 32 bit PPC ISA in FPGA (and thus at a low clock rate), and wanted to optimise for performance, one would obviously also make different design choices than for full-custom-hardware, high clockrate ones.

And if one were to abandon the idea of actually sticking to the existing ISA (like the 68080), then one might make even more suited-to-FPGA design decisions. But, just like on the 68080, what the point is of doing so is rather questionable.

Last edited by umisef on 12-Feb-2024 at 01:49 PM.

 Status: Offline
Profile     Report this post  
umisef 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 13:34:51
#93 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@matthey

Quote:
It is true that instructions 3-5 may be able to be reduced from 3 to 2 instructions in some cases where the addi instruction immediate encoding bits are close to enough.


Can you maybe share an example where the "close enough" for adding 32 bits via a combination of one ADDI and one ADDIS is not a given? Each has 16 bits of immediate, so I have a hard time thinking of a 32 bit value that could not be added that way...

 Status: Offline
Profile     Report this post  
umisef 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 13:47:30
#94 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Gunnar

Quote:
Of course when a design is fully finished - you can give birth and make then 10 times faster ASIC


At which point you'll suddenly find yourself wondering how you can make 10 times faster multi-ported memory, compared to the 315MHz M10K blocks available in the Cyclone V...

Last edited by umisef on 12-Feb-2024 at 01:50 PM.

 Status: Offline
Profile     Report this post  
matthey 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 17:31:37
#95 ]
Super Member
Joined: 14-Mar-2007
Posts: 1989
From: Kansas

umisef Quote:

That goes to what it actually means to "execute the instruction in one cycle". And the only meaning in which that could possibly be true for this instruction is throughput.

But for throughput, pulling 10 bytes out of the instruction fetch buffer for a single instruction is certainly not something the actual 68k CPUs from Motorola can sustain for any amount of time.

What the so-called "68080" does, of course, is a mystery to me, because to the best of my knowledge, there is no "68080 user manual" describing its behaviour.


Throughput is what matters. All but the most primitive CPU core is going to have at least a shallow pipeline. An integer ALU operation can be performed in a single cycle but there is other CPU execution overhead that can keep an instruction with one ALU operation from executing in a single cycle without pipelining.

The example 68k instruction only performs one ALU operation which is simple. It requires handling large instruction data in the code though which you are skeptical of.

https://www.nxp.com/docs/en/data-sheet/MC68060UM.pdf Quote:

This pipeline architecture supports extremely high data transfer rates within the MC68060 processor. The on-chip instruction and operand data caches provide 600 MBytes/sec @ 50 MHz to the pipelines, while the integer execute engines can support sustained transfer rates of 1.2 GBytes/sec.


Even in 1994 using ~500nm silicon, SRAM/cache performance was impressive while electricity has a fraction of the distance to travel using modern silicon that is ~5nm. Logically, the limitation is not how much SRAM data is accessed but the growing requirement to process larger data. Accessing a 32 bit immediate and 32 bit address from the SRAM instruction buffer is no problem and the data is already pre-decoded to reduce further processing. The 68k instruction is a good example of the advantage of a variable length instruction encoding allowing powerful but simple instructions to execute with single cycle throughput that are not possible with a fixed length encoding. More important than the instruction buffer bandwidth is the instruction fetch power requirement which can use 1/3 of the power of a small CPU core. Compressed data reduces the instruction fetch power used (and instruction cache power used) as the 68060 instruction fetch of only 4 bytes/cycle for a superscalar CPU demonstrates. Some power is required for the instruction buffer but they are becoming more popular for high performance RISC CPU cores using fixed length encodings as well. The 68060 is pulling two times 6 byte instructions from the instruction buffer which is 12 bytes of instruction per cycle. The choice to limit to this is likely due to an arbitrary decision to focus performance improvements on the most frequent instructions vs the resources available which were still limited in 1994. There is also a limit of how much pre-processed data the instruction dispatcher/issuer can look at in a single stage. On the NXP forums, the real Gunnar made the case that even the ColdFire should permit 8 byte instructions commenting that timing allows the execution in a single cycle.

https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Coldfire-compatible-FPGA-core-with-ISA-enhancement-Brainstorming/td-p/238714

There are fewer and fewer instructions the larger the instruction size grows but these instructions become very powerful and can be superscalar executed with single cycle throughput rather than executing one instruction in 2 cycles. This is one of the advantages that x86-64 leverages over fixed length RISC encodings but Motorola never chose to exploit the full performance advantage of the 68k or ColdFire.

umisef Quote:

Can you maybe share an example where the "close enough" for adding 32 bits via a combination of one ADDI and one ADDIS is not a given? Each has 16 bits of immediate, so I have a hard time thinking of a 32 bit value that could not be added that way...


I see your point. ADDIS is an add immediate with 16 bit shift rather than add immediate signed. Yes, it should always be possible to load a 16 bit immediate (LI) and add a 16 bit immediate shifted left 16 bits to it to form any 32 bit number. Different RISC CPUs have different ways to form immediates/constants and I don't know PPC assembler well. Integer 64 bit immediates/constants are more of a problem. The immediate/constant data can simply be placed in the code and executed in a single instruction using fewer resources (PPC code performs an unnecessary ALU operation) and the PPC code is dependent increasing latency. OoO execution will fix this when it can and at a cost.

Last edited by matthey on 12-Feb-2024 at 06:06 PM.

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 19:05:58
#96 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@umisef

Quote:
But for throughput, pulling 10 bytes out of the instruction fetch buffer for a single instruction is certainly not something the actual 68k CPUs from Motorola can sustain for any amount of time.



No problem, I can help you with information

The Motorola 68040 can provide 8 Bytes of Instructions per clock cycle from Icache.
The Motorola 68060 was very limited and can only provide 4 Byte of instructions per clock cycle from Icache.
Only 4 Bytes is a serious limitation for the 060 and limits its performance a lot.
The Motorola 060 design team was aware of this major limitation and it was planned to bring out an 68060B
that fixes this - providing 8 Bytes per clock. But this chip never came out.


The Apollo 68080 does provide 128bit = 16Byte per clock cycle from Icache.
This is 4 times more than 060 - and is one of the reason the 080 is much more powerful than the 060.

The Apollo 68060 128bit of instructions per clock cycle is the the same number what good IBM POWER chips provide. The value is state of the art.

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 19:49:04
#97 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4398
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

Quote:
The Motorola 68060 was very limited and can only provide 4 Byte of instructions per clock cycle from Icache.
Only 4 Bytes is a serious limitation for the 060 and limits its performance a lot.
The Motorola 060 design team was aware of this major limitation and it was planned to bring out an 68060B
that fixes this - providing 8 Bytes per clock. But this chip never came out.


What was the trade off? I assume they wouldn't have gone from 8 to 4 if it wasn't a trade off against something else bothersome.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 20:07:04
#98 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Karlos

Quote:

Quote:
The Motorola 68060 was very limited and can only provide 4 Byte of instructions per clock cycle from Icache. Only 4 Bytes is a serious limitation for the 060 and limits its performance a lot. The Motorola 060 design team was aware of this major limitation and it was planned to bring out an 68060B that fixes this - providing 8 Bytes per clock. But this chip never came out.


What was the trade off? I assume they wouldn't have gone from 8 to 4 if it wasn't a trade off against something else bothersome.


The answer is TIME.

The 68060 was in many ways leading and very advanced at its time.

If you fully understand how the 68000 does work internally,
then you know that the 68000 CISC CPU in reality internally is not a CISC CPU
but it was a RISC like core running microcode ROM creating something like a "virtual" CISC.

Yes the programming ISA was CISC .. and to the programmer it looked like a CISC
but internally the 68000 was not a real CISC it was a RISC like core doing each 68000 instruction in several steps.

So what is exiting and new in the 060?
The 68060 is the first "REAL" 68K CISC CPU.
The 060 does not use microcode anymore to emulate a CISC behavior,
but its the first real 68k CISC executing most in real hardware in single cycle.

Motorola also did for the first time develop with the 060 a Super-Scalar 68K.
Super-Scalar means the 060 can execute more than 1 instruction per cycle.
This is great but was badly limited by the Icache.
As with being able to execute more instructions you need to load more from Icache.
The Apollo 68080 is also Super-Scalar and can execute up to 4 instructions peak per cycle.
4 instructions per clock is state of the art.
This feature of the Apollo 68080 is on par with the best IBM POWER cores.

But lets get back to the 060:

Moto did a lot cool stuff in the 060.
But the 060 is NOT perfect.
But this is normal ..

Every business project is under time pressure.
The management often pushes a heavy deadline down to the people.
And its not uncommen that parts of the development team are not able to match the deadline
and then the management of the option to decide to drop CPU features or to extend the deadline.

And the dropping of features happens very often.
It happened to IBM POWER chips that I took part in developing.
And yes it happened also to the 060.
The 060 misses a couple of features .... simple because of "time was over"

You all will know that the 060 lost some very useful MULTIPLICATION and DIVIDE instructions
together with a number of more rare used instructions like MOVEP.
Loosing these instruction makes for example the game SPEEDBALL crash on the 060.

With the APOLLO 68080 I had no time pressure.
This means we could simply take our time to implement everything we want.
There was no management pushing the game over button.
This is why the Apollo 68080 is so much better than all Motorola CPUs.
We have a lot more time to make it better.

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 20:17:03
#99 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4398
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

I know the 68060 was statically wired versus microcoded, but why start out with 4 bytes/clock design in the first place? What does it accelerate in terms of development of the chip? Or is it that it's still notionally 8 bytes/clock but split between the superscalar cores effectively halving when both are executing?

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 12-Feb-2024 20:44:53
#100 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Karlos

Quote:
Or is it that it's still notionally 8 bytes/clock but split between the superscalar cores effectively halving when both are executing?



The 68060 can load 4 byte total.


Maybe it helps if I explain you some more details

The 68000 loads 2 BYTE in 4 CYCLE
The 68020 loads 2 BYTE in 2 CYCLE
The 68030 loads 2 BYTE in 2 CYCLE

The 68040 loads 4 BYTE in 1 CYCLE but its runs internally double clock.
This means an 68040@25 as used in the Amiga 4000 runs internally at 50 MHz .. giving it effectively 8 byte!
The 68040 needs 2 internal clocks for each instruction.
Which means it effectively does 25 Million instructions peak

The 68060@50 as often used in Cyberstorm cards.
Does run internally at 50 MHz.

Its the first real hardware CISC - which can do an instruction in 1 cycle.
It can load the same way as the 040 4 byte per internal cycle.
But as it effective only needs 1 cycle per instruction and has 2 pipes.
This makes the 060 fell like it get 1/4 of the instructions compared to the 040.


As you might know there are cases where a high clocked 040 can outrun a 060,
these are cases where the instruction fetch deficit shows .



Another thing you might need to know.

Big companies often interleave developments.
This means while the 040 development was in progress,the development of the next CHIP e..g the 060 might have been started. This can lead to funny effect like CPU generation 4 having some new features, that generation 5 missed as both developments were started somewhat in parallel.
And when generation 5 was started ... the lessons learned of what was good and bad in 4 was not finished.
So that had effective no time to learn from the previous generation.

As said Motorolas developers wanted to fix this issue and a number more limitations in the 060B - but this never came to market.


We know all the features of all the 68K chips.
We also know all what is good and clever in today INTEL chips
and we of course know the what is good and clever in IBM chips....

The Apollo 68060 tries to take what we think are the best ideas from all of these ....
And we know exactly where the strength and weaknesses of the Motorola 68060 are.
And of course fixed all weaknesses ...


 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle