Click Here
home features news forums classifieds faqs links search
6071 members 
Amiga Q&A /  Free for All /  Emulation /  Gaming / (Latest Posts)
Login

Nickname

Password

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net
Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.
Donate

Menu
Main sections
Home
Features
News
Forums
Classifieds
Links
Downloads
Extras
OS4 Zone
IRC Network
AmigaWorld Radio
Newsfeed
Top Members
Amiga Dealers
Information
About Us
FAQs
Advertise
Polls
Terms of Service
Search

IRC Channel
Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online
29 crawler(s) on-line.
 69 guest(s) on-line.
 0 member(s) on-line.



You are an anonymous user.
Register Now!
 kolla:  6 mins ago
 Beajar:  13 mins ago
 amigakit:  1 hr ago
 VooDoo:  1 hr 5 mins ago
 pixie:  1 hr 16 mins ago
 Hammer:  1 hr 21 mins ago
 Musashi5150:  1 hr 39 mins ago
 amigang:  2 hrs 5 mins ago
 ppcamiga1:  2 hrs 28 mins ago
 kriz:  2 hrs 32 mins ago

/  Forum Index
   /  Amiga OS4 Hardware
      /  32-bit PPC on FPGA
Register To Post

Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 Next Page )
PosterThread
Hammer 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 1:25:25
#141 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@matthey

Quote:

matthey wrote:
I wrote a 68k CopyMem() and CopyMemQuick() patch for the 68060.

http://aminet.net/util/boot/CopyMem.readme

35.54% speedup for CopyMem060 v1.1 (vs AmigaOS 3.9)

It's not worthwhile to try to use the blitter for chip mem and few copies are large enough for MOVE16 to make a difference. The only reason why the speedup is so large is because the AmigaOS is compiled for an old CPU like the 68000. What is the point anyway when the AmigaOS is compiled for a 68000? That is the kind of treatment the 68k receives.

I improved the performance of Warp3D by roughly 10% (much more for certain programs) by disassembling the libraries and optimizing them with much more to go. The Warp3D.library could be half the size. Memory trashing bugs are not fixed and the 68k doesn't get the newest major version. PPC 3D is in better shape than 68k as some people have commented. That is the kind of treatment the 68k receives.

Emulation of the 68k on ARM has a fraction of the performance of native code. The cheap and weak ARM Cortex-A53 that suffers 3 cycle load-to-use penalties often seems to be the CPU of choice for emulation. Optimizations for an Amiga virtual machine are pointless and there is little incentive for developer support. Hardware is ignored when it used to be the most important part of the Amiga success. That is the kind of treatment the 68k receives.

The retro 68k Amiga market is many times larger than the PPC AmigaNOne market but is ignored. The PPC AmigaNOne hardware is subsidized for micro production while 68k Amiga hardware is ignored and practically blocked despite hundreds of thousands of units sold. That is the kind of treatment the 68k receives.

RPI 3A+ ARM Cortex A53 or RPi 4B ARM Cortex A72 under PiStorm-Emu68 is still faster and cheaper than either 68060 rev 6 @ 100 Mhz or AC68080-based solutions.

I'm aware of the AGA clone Cyclone 10 FPGA project with support for PiStorm.

The main advantage of 68060 is 68K MMU support.

The incentive for the AmigaOS 3.X 68K target is universal for the Amiga-compatible platforms.

Amiga's NLE (non-linear editing) software is boat-anchored on specific 3rd party hardware and it's not transferable.

ARM is the real-world cure for Mototola's 68K/PPC and Commodore's MOS 65xx stupidity.

Last edited by Hammer on 14-Feb-2024 at 01:36 AM.
Last edited by Hammer on 14-Feb-2024 at 01:34 AM.
Last edited by Hammer on 14-Feb-2024 at 01:31 AM.
Last edited by Hammer on 14-Feb-2024 at 01:28 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
Hammer 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 2:39:49
#142 ]
Elite Member
Joined: 9-Mar-2003
Posts: 5273
From: Australia

@Gunnar

Quote:

1) Icache can fetch 16 BYTE per cycle.
These 16 BYTE can span 2 cache rows.

I'm already aware of this information for AC68080.

Quote:

This means the 68080 does NOT need to align loop in memory to have full speed.
The 68080 is the first 68k CPU with this feature.

Any low-hanging fruit improvements are good.

Quote:

In parallel to providing instructions to the Decoders
the APOLLO 68080 Icache reload more instructions from memory to Icache.
This is a feature greatly increasing performance.

Modern CPUs have prefetch into cache features. This feature can be implied (speculative via hardware) or explicit (software e.g. PREFETCHW). One of several methods for minimizing pipeline stalls.

Quote:

2) The Decoders can decode up to 4 instructions per cycle.
But frankly 4 instruction in parallel per cycle is an uncommon case.
Normally code runs around 2 instructions per clock.

It depends on the out-of-order and prefetch capability. AMD Jaguar's dual instruction decoder out-of-order design wouldn't match the larger scale AVX capable X86-64 microarchitectures.

Intel's E-Core has three decoders and they have inferior performance when compared to larger-scale P-Cores' six decoders (Golden Cove).

Modern CPUs have uop cache that allows external ISA decoders to be less used.



In-order CPU designs are limited in this regard.

PS; AMD Piledriver is pathetic and removing Hector Ruiz and Bulldozer advocates from AMD is good.

It's unwise to state that the "2 instructions per clock" is enough. It's never enough, hence why other CPU companies have road maps. To minimize the Osborne effect, AMD or Intel sells the platform i.e. the customer buys into a platform with CPU upgrade promises.

Quote:

3) the Apollo 68080 has TWO EA units.
Each EA unit can calculate address of using memory (read/write) and
can execute instructions like LEA, ADDA, SUBA on its own.

From my observation, I speculated that the 68080 has two IEUs.

Quote:

The IBM 970 PowerPC was the first PPC having the same feature as the APOLLO 68080
All previous PPC can not do this automatic memory prefetching and have because of this a much lower memory performance. This feature makes a bug difference and is the reason why Apollo 68080 at 90MHz can outrun in memory operations a Motorola G4 PPC at 1000MHz running in an AmigaONE.

MAI Logic's Articia S enabled AmigaOne XE is a sitting duck.

In the early 2000s, I had NVIDIA's nForce 2 DDR-333/DDR-400 instead. I gave away my MSI K7T VIA KT133A motherboard to my relatives.

The AmigaOS 4.x PowerPC camp has replaced Articia S-based motherboards.

Quote:

5) The APOLLO 68080 has 4 ALU Units:
- 64bit AMMX SIMD unit
- FPU
- 2 normal integer ALUs

64-bit AMMX is useless for Amiga's 68K legacy i.e. should have focused on fattening the 68K FPU instead of 64-bit AMMX.

I'm aware of work-in-progress Amiga-related projects with Altera Cyclone 10 FPGA usage.

I'm aware that V4's Cyclone 5 may not have a transistor budget for multiple FPUs. Have you considered asymmetric FPU FP32/FP64 and FPU FP32 pipelines?

Quote:

6) 2 parallel working Bus Memory Controller

The 68080 has 2 independing memory controllers.
The fastmem memory controller supports 64Bit access and support prefetching and bursting.

The 64bit wide memory bus allows the 68080 to reach far better memory performance than any 68K before.

That's a logical evolution when 68060's dual 68K decoders need a 64-bit front-side bus. 68060 has a lesser priority for Motorola.


Quote:

The Apollo 68080 has in addition a second 32bit memory controller which can in parallel operate.

Both controllers can work in parallel without blocking.

The Amiga bus/ Zorro and Chipmemory are connected to the 2nd controller.

This allows the 68080 CPU in an Amiga to e.g. do for example in parallel a write to chipmemory,

while at same time loading more instructions from fastmemory.

The power of two independent memory controllers makes the 68080 extremely powerful in Amigas.

As you will know the Amiga 1000/500/600/2000 mainboards speak 68000 protocol,
while the A1200,A3000,A4000 mainboards speak the differen 68020/30 protocol.

The Motorola 040/060 CPU have another incompatible protocol

I'm aware that 040/060 CPUs have an incompatible protocol, hence the necessary glue logic.

C= A3640's relatively large size PCB and glue logic are not for show. Other 3rd party Amiga CPU accelerators have superior implementation when compared to C= efforts.

Quote:

This makes using 040 or 060 CPU complicated in Amiga - as you need extra logic to translate the protocols.
This makes 040 and 060 slower in accessing Amigas chipmemory.

With a very fast CPU, A1200's Chip RAM access is about 7.1 MB/s. A very fast CPU can cause timing (resource arrival/departure sync) problems with Amiga custom chips, hence turtle/slow-mo mode.

Amiga's legacy hit-the-metal games don't have resource tracking.

For example, PiStorm-Emu68 has reached 7.1 MB/s on A1200's Chip RAM. Extra development time was committed to Emu68's slow-mo features.

Warp1260 can do 7MB/s A1200 Chip RAM writes.

Games like DevilutionX (DIABLO open source clone, 640x480p 256 colors) would be slow on C= AGA.

Quote:

The APOLLO 68080 CPU speaks natively both 68000 and 68030 bus protocol.
Thus means you can directly connect it to Amiga mainboards - and not need any translation that would slow down.

What's important is the real-world program benchmark's performance e.g. Lightwave, Quake and 'etc'.

Shapeshifter should be able to run MacOS 68K's Adobe Premiere 4.0 non-linear editing (NLE) software.

Last edited by Hammer on 14-Feb-2024 at 04:40 AM.
Last edited by Hammer on 14-Feb-2024 at 04:04 AM.
Last edited by Hammer on 14-Feb-2024 at 03:57 AM.
Last edited by Hammer on 14-Feb-2024 at 03:16 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

 Status: Offline
Profile     Report this post  
agami 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 2:48:58
#143 ]
Super Member
Joined: 30-Jun-2008
Posts: 1648
From: Melbourne, Australia

@Karlos

Quote:
Karlos wrote:
@ppcamiga1

What are you going to do when your PPC machine(s) fail? Is that the end of the road for you?

Yes. Can it please be the end of his Amiga forum posting road.

Now, how to make his Mac mini fail?

_________________
All the way, with 68k

 Status: Offline
Profile     Report this post  
agami 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 3:26:53
#144 ]
Super Member
Joined: 30-Jun-2008
Posts: 1648
From: Melbourne, Australia

@NutsAboutAmiga

Quote:
NutsAboutAmiga wrote:

...expect we also have native PowerPC programs and tools

Really? A treasure trove, I'm sure.
It's basically a somewhat contemporary web browser, and in the case of MorphOS a competent modern email client.

Almost any "native PowerPC" app on AmigaOS and MorphOS is a direct or indirect port of an open source project. Which can also be ported to Amiga OS 3 or AROS 68k running with an emu68 accelerator. Which BTW will soon, if not already, outnumber AmigaOS 4/MorphOS PowerPC systems in active use.

_________________
All the way, with 68k

 Status: Offline
Profile     Report this post  
agami 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 3:56:45
#145 ]
Super Member
Joined: 30-Jun-2008
Posts: 1648
From: Melbourne, Australia

@ppcamiga1

Quote:
ppcamiga1 wrote:

x86 and arm is boring
i have x86 and arm in work and don't want to made software for it after work
accept that

No one has x86 and ARM at work. They have applications running on Windows or macOS.
Unless they're writing compiler software or drivers, and I know you're not doing any of that. Or any ASM coding.

Embedded software developers use C and C++. They're also implementing HAL, so even they don't care if it's ARM or RISC V, or whatever.

WinUAE is just like Amiga from 1997, only million times faster.
I know you have WinUAE and I know you love it.
In know you hate Windows, but you don't know how to set up UAE on Linux so you're stuck with loving Amiga OS on a Windows PC you hate.

Mature your childish psyche.

Last edited by agami on 14-Feb-2024 at 03:59 AM.

_________________
All the way, with 68k

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 6:48:16
#146 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Hammer

Quote:
It's unwise to state that the "2 instructions per clock" is enough.
It's never enough, hence why other CPU companies have road maps.


To whom do you talk?
I never said this.

The Apollo 68080 CPU has 6 EXECUTION units.
2 EA, 2 IALU, 1 AMMX, 1 FPU
The Apollo 68080 can do up to 4 instructions per cycle.




 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 7:12:31
#147 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Hammer

Quote:

Modern CPUs have prefetch into cache features. This feature can be implied (speculative via hardware) or explicit (software e.g. PREFETCHW). One of several methods for minimizing pipeline stalls.


Yes good modern CPUs have hardware prefetching.
The same way as the APOLLO 68080 has hardware prefetcher.

Your post makes is sound like Sotware Prefetching and Hardware Prefetching are the same.

Are they the same?
No they are very different in result.

Software prefetchers can in general not be used by high level programming,
This means C-Compilers will for example on PPC never use them.

This means no normal application will benefit from Software prefetching.
You always need to handwrite this in assembler and you need to write different code for each CPU model.

Using software prefetching is pretty complicated and the code that you write needs to 100% tuned for one CPU model. You need to use the prefetchers differently on each CPU model.

This means you need to write the code different for the PPC 5200 (Efika),
different for the PPC 440 (SAM), different for the PPC 750 (Pegasos 1), different for the PPC 7400 G4 (Pegasos 2/AmigaOne) and you need different code for the PPC 970 G5 PowerMac, and you need different code for running proper on CELL.

I have written at IBM optimized Memcopy routines using software prefetching for different POWER models.
You have to write the code 10 times - for each model and switch to the right routine.

Hardware prefetchers are magnitudes easier and a lot use fuller.
Hardware prefetcher work automatically and all software benefits from them.


All PowerPC CPUs ever support software prefetching.
But as explained because of all the challenges to use software prefetching its not that useful for general software in my experience.

IBM high end POWER CPU have hardware prefetching since very long.
The first consumer PowerPC having hardware prefetchin was IBM PPC 970 == G5


 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 9:36:20
#148 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@agami

Quote:

@ppcamiga1

I know you have WinUAE and I know you love it.
In know you hate Windows, but you don't know how to set up UAE on Linux so you're stuck with loving Amiga OS on a Windows PC you hate.


If only it was as simple as running the windows binary on wine. Oh wait. It is. That's how I do it.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 11:34:40
#149 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@NutsAboutAmiga


@green_naam wrote:
Quote:
Intuition does a zillion copymem calls per minute. Even as small as several bytes. But at least 33% are unaligned. Meaning that the realworld speedup of cache optimised copy routines is very small. The only real difference that I could find so far was copying files. There I could achive a speedup of ~35%.


Which is what I was trying to tell you. You only get a "huge boost" if your workload is memory copy bound, which most application softare tends not to be.

Typical compiled application code in languages like C and C++ won't even contain calls to the system routines for copying unless someone has explicitly called them or a standard library routine that calls them, it just inlines the necessary instructions to transfer data from source to destination. You only need to look at the assembler output of your source to verify that.

@green_naam
Quote:
A simple byte copyloop is already faster then CopyMem on OS4 for X5000. I assume that OS4 uses a byte copyloop as well but doing it yourself would save the calls delay.


My recollection of the code is that it has a number of cases it passes through to try and optimise copies but that's some years ago now and the chances are they only ever made any real difference on 60x/G3/G4. It may be the case it has been replaced with something simpler, or is patched for the most stable solution for any given supported HW.

Regardless of which, your observation here demonstrates my argument to NutsAboutAmiga very well.

Last edited by Karlos on 14-Feb-2024 at 11:44 AM.
Last edited by Karlos on 14-Feb-2024 at 11:44 AM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
umisef 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 12:09:26
#150 ]
Super Member
Joined: 19-Jun-2005
Posts: 1714
From: Melbourne, Australia

@Gunnar

Quote:

Quote:
And let's be real --- the only reason the 68080 can do that (if it can), is because it is running at 1995 clock rates, but on hardware containing modern day memory blocks.


The reason is that the 68080 is designed like a modern CPU from INTEL or IBM.

And because it not only runs in an FPGA, but was designed for an FPGA, with the inherent imbalance between the speed of embedded SRAM memory and the speed of the CLB-implemented logic. It was designed with ridiculous levels of memory bandwidth per ALU clock cycle as a given.

Quote:

The reason that the 68080 runs slow is because its inside an FPGA.
If you put the latest IBM 4 GHz PPC core into an FPGA - then it will run also at clockrate compareable of the 68080.

Yes --- and it would be a remarkable stupid design at that. Because the latest IBM PPC core was designed for raw silicon, not for FPGA. It was designed for ALU clocks measured in fractions of nanoseconds.

Quote:

When we spoke about about how many cycles a PPC needs for this example..
Then I spoke from experience - as a person that know IBM PowerPC cores by heart,
that are for example used in several so called PowerPC Amigas.

You spoke "fantasy" numbers. You not spoke about any 460 PowerPC or 750 PowerPC and not from 970 PowerPC.


So you still maintain that a PPC would use 6 instructions to add 100,000 to a memory location?

Quote:

I find it disappointing that you spoke "out of your ass" inventing numbers


See, that's why I included the link to the compiler explorer. So that anyone -- yes, even you, Gunnar --- could retrace the steps I took to arrive at those numbers. And somehow, while you are spewing invective, you don't seem to actually refute them...

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 12:38:12
#151 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@geen_naam

No rephrasing necessary. I was using your reply to respond to an earlier point @NutsAboutAmiga makes that 68K applications running under emulation would get a huge boost on OS4 thanks to things like native memory copy.

My counterpoint was that most application code doesn't actually use it anyway, since the compiler and even assembly language programmers on 68K are more likely to just use a few inlined move instructions to perform a shallow value copy of some struct. What they don't do is call CopyMem() for those because the overhead in doing so isn't worth it for copies below a certain size anyway.

This is not to say that said 68K applications won't benefit from other native system calls, in particular, graphics, datatayes or other data processing intensive operations.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 12:57:02
#152 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@umisef


Quote:
And because it not only runs in an FPGA, but was designed for an FPGA,


This is not true.
Why do you claim stuff that you no know?
Please stop doing this.
Making up such stories is unfair.



Do you have experience in CPU design?
I think you don't!

Do you understand Verilog and VHDL have understand you chips are made?
I think you don't!

Do you have expert knowledge of how several 68K CPU work internally?
I think you don't!

Do you have expert knowledge of how several PowerPC CPU work internally?
I think you don't!

Is it not true, that you have no experience and no knowledge in the topic you talk about?
Nevertheless you make wrong claims, and attack people which are have both knowledge and experience in the area.




@Umisef, there is nothing wrong that you not understand any topics.
I'm happy to help you and give you explanations and tell you how things work internally.


But what you do, making up stuff about topics you not know much about has a name
You are lying! Please stop this.


Yes FPGA are a development vehicle for developing CPU cores.
At IBM we did develop all Power and PowerPC cores in simulation and in FPGAs.
The APOLLO 68080 is designed and developed using the same techniques.
That today CPUs are developed in FPGA is state of the art.



Quote:

So you still maintain that a PPC would use 6 instructions to add 100,000 to a memory location?



Yes out of my head the PPC code uses 6 instructions for this, plus 2 memory/cache access

PowerPC

1 lis R2,ha16(100000)
2 addi R2,R2,lo16(100000)
3 lis R3,myscore'high
4 ldw mscore'low(R3),R4
5 add R4,R4,R2
6 stw R4,myscore'low(R3)


==

Yes the 68K only needs 1 instruction for this, plus 2 memory/cache access

68K

add.l #100000,myscore



If you see better code than I, then just post and we talk about it.
Of course also on the PowerPC you have the option to trade in one instructions
for doing more memory access. You can for example put the
number 100000 in your program data section somewhere and load it from memory.
Trading in 1 less instruction for costing 1 memory access can make sense,
but in my experience its not a win.


@Umisef, I'm a PowerPC developer by profession.
I took part in developing CPUs that you might use.

First of all, I would appreciate if you could be less aggressive.
Can we talk about this like normal people?
I have working on CPU design and can also explain you where certain CPU bubbles come from.
Of course we can even talk about what features the different CPU models have.
For example POWER 8 took one design idea from the APOLLO 68080 CPU.
So there is one idea that we had on APOLLO that we took over to the POWER8 core.
I think the whole discussion should be put on a different level.
Less claiming more asking and discussion.
Do you think we can do this?

Last edited by Gunnar on 14-Feb-2024 at 02:22 PM.
Last edited by Gunnar on 14-Feb-2024 at 01:14 PM.

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 13:47:30
#153 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar & @umisef

So, this is my result from the compiler explorer, using powerpc gcc13.2 with -Ofast


extern int myscore;

void inc() {
myscore += 100000;
}


compiles down to

inc():
lis 10,myscore@ha
lwz 9,myscore@l(10)
addis 9,9,0x2
addi 9,9,-31072
stw 9,myscore@l(10)
blr

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
CosmosUnivers 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 13:51:34
#154 ]
Regular Member
Joined: 20-Sep-2007
Posts: 101
From: Unknown

@Gunnar

Quote:

At IBM we did develop all Power and PowerPC cores in simulation and in FPGAs

Yes the PPC code uses 6 instructions for this, plus 2 memory/cache access

PowerPC

1 lis R2,ha16(100000)
2 addi R2,R2,lo16(100000)
3 lis R3,myscore'high
4 ldw mscore'low(R3),R4
5 add R4,R4,R2
6 stw R4,myscore'low(R3)


Yes the 68K only needs 1 instruction for this, plus 2 memory/cache access

68K

add.l #100000,myscore



Oh, the PPC is from you and some others... I understand now !

- Gunnar know the 68k was good, he inverse, he created something worse (PPC)...

- Gunnar know the 68k instructions are easier to learn for the brain (like 'move'...), he inverse, he used the first letters for the PPC instructions ('mr' = move register...)

- Gunnar know ONE instruction is good, he inverse, he create the PPC with SIX instructions for doing the same thing...

- Gunnar know the 68k relativity (PC) is excellent (less reloc hunks), he inverse, he deleted his feature from the PPC...

- Gunnar know the CPU core instruction design is made for coder's fun, he inverse, he created the PPC with something unreadable (very few asm coders on the PPC)...

- Gunnar know the 68k asm was loved by coders, he inverse, he designed his PPC for the C/C++ compilators...

- Gunnar know the 68k have tiny code density, he inverse, he created the PPC with much bigger density...


!!!!!!!!!!!!!!! AND HE CALL THIS A "PROGRESS" COMPARED TO THE PREDECESSOR 68K : NO RISC NO FUN !!!!!!!!!!!!!!


Gunnar know the PPC is a big shame, he inverse, he is proud of his crap...

There is zero hope with this inverted guy, the Amiga community must wake up and see what they are : most of our elite are like this...

Last edited by CosmosUnivers on 14-Feb-2024 at 02:19 PM.
Last edited by CosmosUnivers on 14-Feb-2024 at 02:05 PM.
Last edited by CosmosUnivers on 14-Feb-2024 at 02:02 PM.
Last edited by CosmosUnivers on 14-Feb-2024 at 01:53 PM.
Last edited by CosmosUnivers on 14-Feb-2024 at 01:53 PM.

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 13:53:29
#155 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

I just reminded myself how much I like gcc assembler syntax for powerpc. Which is not at all.

So it adds on 0x2000 (131072) by using add immediate shifted, then adds -31072 to correct it, before writing back.

Last edited by Karlos on 14-Feb-2024 at 01:56 PM.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 14:03:42
#156 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@CosmosUnivers



Gunnar didn't invent Power PC or the syntax for it's assembler. He worked on implementations of it.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 14:16:23
#157 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Karlos

Quote:

lis 10,myscore@ha
lwz 9,myscore@l(10)
addis 9,9,0x2
addi 9,9,-31072
stw 9,myscore@l(10)


I stand corrected. Thank you.
Doing it this way, you can actually do it in 5.

But you can call "5 versus 1" still a disadvantage.


In my opinion an important part here is to know that the LOAD instruction "LWZ" is executed on the same pipeline step as the ALU ADDIS - but not at same time!

The 68K have extra pipeline steps for doing loads.
This means the 68K design can do a Load as part of the normal pipeline without extra bubbles

But the PowerPC can not do this.

In other words, the pipeline of many PowerPC chip have after the LWZ a usage bubble.
Often between the LW and the next instruction using it - the CPU needs to wait 2 extra bubble cycle.

 Status: Offline
Profile     Report this post  
Karlos 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 14:21:06
#158 ]
Elite Member
Joined: 24-Aug-2003
Posts: 4402
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Gunnar

5 or 6, either way, it's still eye-stabbingly awful to look at.

_________________
Doing stupid things for fun...

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 14:32:30
#159 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@Karlos

Quote:
5 or 6, either way, it's still eye-stabbingly awful to look at.


I think every programmer does agree that 68K is very good to read and PPC code is difficult to read.

In my experience read difficulty is also a pain problem when you debug code.
When you debug you often go throw the assembler code one by one and follow it.


I personally find 68k instruction very nice and easy to read:
FADD.D #1.0,fp0

 Status: Offline
Profile     Report this post  
Gunnar 
Re: 32-bit PPC on FPGA
Posted on 14-Feb-2024 14:38:30
#160 ]
Regular Member
Joined: 25-Sep-2022
Posts: 477
From: Unknown

@CosmosUnivers

Quote:
Oh, the PPC is from you and some others... I understand now !


No, I did not invent the PPC architecture.
When PowerPC was invented I was still in my last year of school, coding cracktros on my Amiga.

Later I just worked for IBM in PowerPC hardware development center.
Some people take jobs to work for money - instead living on social assistance.

 Status: Offline
Profile     Report this post  
Goto page ( Previous Page 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 Next Page )

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]
Copyright (C) 2000 - 2019 Amigaworld.net.
Amigaworld.net was originally founded by David Doyle