Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6214 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

arden2222: 39 mins ago

amigakit: 59 mins ago

kolla: 1 hr 4 mins ago

zipper: 1 hr 8 mins ago

DC_Edge: 1 hr 10 mins ago

kamelito: 1 hr 33 mins ago

MEGA_RJ_MICAL: 1 hr 40 mins ago

TheMightyTRexUK: 2 hrs 52 mins ago

Panthro: 3 hrs 7 mins ago

pixie: 3 hrs 13 mins ago

Forum Index

Amiga OS4 Hardware

some words on senseless attacks on ppc hardware

Poster

Thread

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 6:07:14

[ #1121 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4431
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

You claimed Intel's AVX beats PPC's VMX/Altivec and this has a performance criteria.

Yes, in THIS context it was about performance.
Quote:
IBM's POWER8 has 64 128-bit VSR SIMD instruction set with an 8,192 bits total.

Intel's current client Raptor Lake and Meteor Lake SKUs are stuck in 16 register AVX2 with 4,096 bits total.

Intel's AVX10.1 non-Xeon SKUs are stuck in 32 registers 256-bit AVX10 SIMD instruction set with 8,192 bits total fast SRAM data storage.

LOL. Again?!? Potatoes! Potatoes! Potatoes! I'm the king of potatoes because I've the biggest amount of them!
Quote:
Intel's AVX10.x is not yet released until Granite Rapids.

Intel does NOT guarantee 256-bit SIMD hardware for their Core i3 N series Gracemont and P-Core count greater than 8 across all Raptor Lake SKUs. Intel is selling half-baked AVX2 hardware.

And WHO CARES?!?
Quote:
You have shown your ignorance. Look in the mirror.

Sure, King of Potatoes.
Quote:
With PCIe 4.0 support and greater than 4,096 bits vector register storage criteria, I can purchase a reasonably low-cost IBM POWER9 CPU. The problem is platform cost.

Again? Potatoes! Potatoes! Potatoes! I'm the king of potatoes because I've the biggest amount of them!
Quote:
I'm not buying yet another Intel SkyLake-X with struggling 4Ghz and aging PCIe 3.0 I/O.

But you bought it, Mr. Coherence.
Quote:
Since Zen 2's release, AMD has abandoned 128-bit SIMD-equipped Ryzen Zen 1, Jaguar/Puma on the desktop client-side SKUs.

It was Zen3. Zen 2 was a small update of Zen1.
Quote:

Hammer wrote:
@cdimauro

Quote:

Hey, have you bought processors that have AVX-512 but implemented only with 256 bits? Really? "Coherent

1. Refer to AM5's road map. Zen 5 is a drop-in replacement for AM5.

LGA 1700 is a dead end.

LGA 1700 lacks AM5's two discrete NVMe PCIe 5.0 4X lanes e.g. ASUS ROG Strix X650-E.

2. On most Intel CPUs with AVX-512 support, there are 2 classes of 512-bit instructions: instructions executed by combining a pair of 256-bit units, hence having an equal throughput for 512-bit instructions and 256-bit instructions, and the second class of instructions, which are executed by combining a pair of 256-bit execution units and also by extending to 512 bits and another 256-bit execution unit.

For the second class of instructions, the Intel CPUs have a throughput of two 512-bit instructions per cycle vs. three 256-bit instructions per cycle.

Compared to the cheaper models of Intel CPUs, Zen 4, while having the same throughput as Zen 3, i.e. two 512-bit instructions per cycle vs. four 256-bit instructions per cycle in Zen 3, either matches or exceeds the throughput of the Intel CPUs with AVX-512.

Compared to the Intel CPUs with AVX-512, Zen 4 allows 1 FMA + 1 FADD, while on the Intel CPUs only 1 FMA per cycle can be executed.

The only important advantage of Intel appears in the most expensive models of the server and workstation CPUs, i.e. in most Xeon Gold, all Xeon Platinum, and all of the Xeon W models that have AVX-512 support.

In these more expensive models, there is a second 512-bit FMA unit, which enables a double FMA throughput compared to Zen 4. These models with double FMA throughput are also helped by a double throughput for the loads from the L1 cache, which is matched to the FMA throughput.

The AVX-512 implementation in Zen 4 is superior to that in the cheaper CPUs like Tiger Lake, even without taking into account the few new execution units added in Zen 4, like the 512-bit shuffle unit.

Only the Xeon Platinum and the likes of the Sapphire Rapids will have a greater throughput for the floating-point operations than Zen 4, but they will also have a significantly lower all-clock frequency (due to the inferior manufacturing process), so the higher throughput per clock cycle is not certain to overcome the deficit in clock frequency.

Intel Sapphire Rapids is not ideal for a low thread count Adobe content creation suite nor in low latency/very high clock speed PC games. Intel Sapphire Rapids wouldn't beat AMD's Ryzen 7 7800X3D in PC games.

A Wall-of-non-sense to avoid answering to the question. Let me quote YOU again:

"don't support XYZ-bit AVX* instruction set marketing when its corresponding XYZ-bit hardware is NOT guaranteed"

But you've done it, Mr. Coherence!
Quote:
----
Back in the real world:

For the money vs performance and pure performance, raytracing is better on hardware accelerated with RTX ADA GPUs and perhaps on USD $999 RX 7900 XTX. The Amiga's custom ASIC hardware acceleration spirit is alive with NVIDIA's and AMD's hardware-accelerated raytracing-capable GpGPUs. The large bulk AI workload is dominated by NVIDIA.

The use case for server X86-64 CPUs is FP64, FP80, larger scale remote desktops/virtual machines, higher PCIe lanes, OS host, hypervisor host, large scale databases, and non-GPU accelerated programs.

There are NVIDIA and AMD server-only GpGPUs with good FP64.

For large bulk AI processing contract work, I rather use my RTX 4090, RTX 4080, and two RTX 3080 Ti GPUs.

Here's the usual Hammer's PADDING...
Quote:
Quote:

What a mess. You jump from pipelining to superscalar pipelines with nonchalance: they are the same things to you. Just to show, again, how much ignorant you are.

BTW, even the 6502 was pipelined...

Where's 65K's 3 Ghz clock speed implementation?

Where's 65K's at least 1 IPC throughput?

Where's 65K's 32-bit implementation?

Where's 65K's 64-bit implementation?

For ASIC implementation, the 65xx CPU family is a dead end. ARM road map replaced Commodore's 65xx crap R&D road map. Western Design Center (WDC)'s stalled around 16-bit 65C816. 65C832 wasn't released.

In the real world, Acorn's ARM CPUs replaced the crap 65xx R&D road map from Commodore and Western Design Center (WDC).

And here again, since you do NOT understand what pipelining means, and now you're rolling another Wall-of-non-sense. As usual...
Quote:

Hammer wrote:
@cdimauro

Quote:
ROFL. You're counting the bits used for all SIMD registers and using the totals to compare two architectures.

So, you just count them and completely ignore how they are used. STRA-LOL.

What do you think, that the bits used on registers are like the potatoes sold in the market after the corner? The more that you've, the better it is?

This is THE clear measure of how you do NOT understand, at all, architectures.

I add nothing else, because enough is enough.

What matters in the real world is performance.

No, in this context we were talking about RISCs vs CISCs with their foundations.
Quote:
https://www.youtube.com/watch?v=yTMRGERZrQE
Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?

Yes, it still matters. As I've PROVED on my article.

And as I've PROVED by reporting two clear examples that dismantle the statement from Stanford University. You can ask Jim Keller to help here, eh!
Quote:
Jim Keller >>>>>> YOU.

Oh, what a news: a logical fallacy! That's argumentum ab auctoritate: https://en.wikipedia.org/wiki/Argument_from_authority

Great way to argue, eh? Well, you have no clue at all of those topics, so that's the maximum that you can do: logical fallacies to "sustain" your "position".
Quote:

Hammer wrote:
@cdimauro

Quote:

what about this: https://en.wikipedia.org/wiki/ARM_architecture_family#Thumb-2

Thumb-2 technology is available in the ARMv6T2 and later architectures.

Thumb-2 are still fixed length instructions in shorter 16 bits.

For ARMv6T2 and beyond, the programmer can select two fixed-length instruction sets.

And here, again, you show how much ignorant you're, since you've never opened the Thumb's architecture manual.

No, Thumb-2 is a mix of 16 AND 32-bit opcodes. So, it's variable-length by definition!

And not only Thumb-2: even Thumb (the first ISA) has 16 and 32-bit instructions, despite it's being reported as 16-only.

Is that much difficult to open their manual and CHECK THE LENGHT OF THEIR INSTRUCTIONS?!? Instead of wasting time by googling...
Quote:
Quote:

What a mess. You jump from pipelining to superscalar pipelines with nonchalance: they are the same things to you. Just to show, again, how much ignorant you are.

68060 has a low clock speed, idiot.

LOL. Another statement of your complete ignorance, since this has NOTHING to do with the topic which was being discussed.

And what you do, to enforce your ignorance? Going purely personal and offending again. Since that's the only thing that you are able to do, ignorant!

Now, PROVE me that your statement about the 68060 has anything to do with this part of the discussion (read CAREFULLY what I've written)!

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 6:09:31

[ #1122 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4431
From: Germany

@matthey

Quote:

matthey wrote:

There are likely other reasons the 68040 runs hot like less power gating than the 68060.

What I recall is that it was using an internal clock with double the nominal frequency.

So, basically like what Intel did with its 80486DX2.

Double the clock -> much more heat.

Which COULD explain it. But I'm not sure about it: to be verified.

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 6:19:42

[ #1123 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@Hammer

Quote:
IBM's POWER8 has 64 x 128-bit VSR SIMD instruction set with an 8,192 bits total.
Intel's current client Raptor Lake and Meteor Lake SKUs are stuck in 16 register AVX2 with 4,096 bits total.

And IBM CELL has 8 x 128 x 128-bit Vector register = 131,072 bits total!

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 6:36:57

[ #1124 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@cdimauro

Quote:
There are likely other reasons the 68040 runs hot like less power gating than the 68060.

This was not a topic in the 90th yet.

Quote:
What I recall is that it was using an internal clock with double the nominal frequency.

The 68040@25 and the 68060@50 run at the same clock of 50 Mhz

--

Power Consumption consists by 2 parts.

A) Active power consumption.
The count of "bits" that flip inside the Chip times the Voltage you use.

As more the chip "does" = more bits flip per clock,
as higher the clock = more flips per second
as higher the core voltage
= as more power is consumed.

You can reduce power consumption by making sure you not "flip" bits when doing nothing.
For this you can clockgate unit not used.

B) Passive leakage of power.
Leakage means power you loos even without flipping.
Leakage is more problem that is important for chips of today than of the 80th and 90th.

The Motorola 68040@25 and Motorola 68060@50Mhz both run internally at 50Mhz

040 = 50MHz * 5 Volt = more heat
060 = 50MHz * 3 Volt = less heat

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 7:09:03

[ #1125 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@matthey

Quote:
The 68060 should have been out clocking 3-6 stage pipeline CPUs of the day,including most simple RISC cores

I can see why you think this on first glance.
But I can explain you why this is not the case.

Its correct that early PowerPC had shorter pipeline than the 68060.
But this does not mean that the 68060 can be higher clocked.

For the clockrate that you can reach not the amount of pipeline stages matters
but the amount of logic in a single pipeline step.

What counts is the number of transistors between register to register.

The 68060 does a lot more work in his pipeline in one instruction than the PowerPC does.

The 68060 pipeline is designed to calculate an EA, do a Cache load, and do a ALU operation on it,
and even write the result back - in a single instruction.

This means its effective doing 3 times the amount of work that the PowerPC pipeline does.
This means it has A LOT more transistors in his pipeline.

If you mind this then it will make sense
that even if the 68060 has more pipeline stages than the 601/603 PowerPC -
as its does 3 times more work in his pipeline ...
the amount of transistors in each stage is NOT much less than the PowerPC step - therefore he can not outclock them.

Makes sense now?

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 20-Mar-2024 21:27:28

[ #1126 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2743
From: Kansas

Hammer Quote:

Where's 65K's 3 Ghz clock speed implementation?

Where's 65K's at least 1 IPC throughput?

Where's 65K's 32-bit implementation?

Where's 65K's 64-bit implementation?

For ASIC implementation, the 65xx CPU family is a dead end. ARM road map replaced Commodore's 65xx crap R&D road map. Western Design Center (WDC)'s stalled around 16-bit 65C816. 65C832 wasn't released.

The 65xx CPU family is dead end for what purpose? For an extremely inexpensive CPU that sleeps most of the time and primarily does 8 bit processing using a tiny memory, it is still a good embedded CPU and there are plenty of ASICs still using it. The 65xx CPU family was never going to scale up well. It is an accumulator architecture where practically every operation accesses memory and the code density is poor. The 16 bit versions of the 65xx family were for compatibility to leverage existing software with some enhancements.

Hammer Quote:

In the real world, Acorn's ARM CPUs replaced the crap 65xx R&D road map from Commodore and Western Design Center (WDC).

GP register and 32 bit architectures were a major upgrade for performance and versatility. ARM CPUs were nothing special for many years. Originally, they had no caches, no hardware MUL/DIV and poor code density. Superscalar and OoO core designs were even slower to arrive. Performance was usually poor and it was nearly a decade before ARM surpassed the performance/MHz of the 68060. ARM proliferated with affordable licensing and adopting the Thumb2 ISA licensed from Hitachi SuperH tech derived from the 68k to improve code density. Then it was just economies of scale in the embedded market that made them popular but it was far from a quick transformation.

RISC Volume Gains But 68K Still Reigns
https://websrv.cecs.uci.edu/~papers/mpr/MPR/19980126/120102.pdf

In 1997, the graph from the link above shows 32 bit embedded system volume shipments.

1. 68k 79,300,000
2. MIPS 44,000,000
3. SuperH 23,500,000
4. ARM 10,000,000
5. i960 9,000,000
6. x86 9,000,000
7. PowerPC 3,900,000

Why does such a superior ARM architecture have ~1/8 of the 68k volume after Motorola/Freescale designated PPC as the replacement for the 68k? Why are PPC volumes only 1/20 of the 68k? Why would a business give up or even risk this kind of market dominance?

Gunnar Quote:

That "fixed length" instructions are needed for high clock rate is a MYTH

High clock rate does depends only on pipeline length and process used.

Gunnar Quote:

I can see why you think this on first glance.
But I can explain you why this is not the case.

Its correct that early PowerPC had shorter pipeline than the 68060.
But this does not mean that the 68060 can be higher clocked.

For the clockrate that you can reach not the amount of pipeline stages matters
but the amount of logic in a single pipeline step.

These 2 statements above are conflicting. The "pipeline length" is the number of pipeline stages which increases the achievable clock rate. The amount of logic in a single stage may be important. Some logic is done in parallel with no affect on timing while many sequential logic gates slow the electricity which must propagate from one stage to the next with acceptable propagation delay. Pipelining reduces the propagation delay through fewer sequential gates in each stage rather than through all gates at once which allows a higher clock speed. Whether the pipeline is for a RISC core, CISC core or even not a CPU core does not matter. More complex logic may require more stages, a longer clock cycle or rebalancing the workload between stages. Most stages between a CISC and RISC core are similar.

68060 in-order 8 stage pipeline
1 IAG (instruction address generation)
2 IC (instruction fetch cycle)
3 IED (instruction early decode)
4 IB (instruction buffer)
5 DS (decode and select)
6 AG (address generation, EA calc, early instruction execution)
7 OC (op fetch cycle, EA fetch)
8 EX (instruction execution)

SiFive U74 in-order 8 stage pipeline
1 F0 (instruction fetch 0)
2 F1 (instruction fetch 1)
3 F2 (instruction fetch 2)
4 ID (instruction decode)
5 AG (address generation, early instruction execution)
6 M1 (data memory access 1)
7 M2 (data memory access 2, late instruction execution)
8 WB (register writeback)

Different names sometimes but same design and similar stages. Yes, the U74 core updates the PC based on instruction size and branch prediction. Yes, the U74 core uses an instruction buffer even though it has no stage named after it. Yes, the U74 core executes instructions in the AG and M2 stages because the manual says it does. Yes, the 68060 performs register writeback after the EX stage. There are minor differences even though the order is the same. The 68060 likely does more predecoding before the instruction buffer but the U74 core probably does at least some converting compressed 16 bit encodings into 32 bit encodings. The U74 core has labeled 3 instruction fetch stages and 2 data fetch stages which may be needed at higher clock speeds depending on the instruction and data cache sizes, caches access timings, fab process and desired max clock speed. The ColdFire V5 which is a more modern, simplified and higher clocked version of the 68060 core design did add more fetch stages (300-366MHz in 2002 100% synthesizable meaning easy auto layout instead of optimized with custom blocks for higher clock speeds).

ColdFire V5 in-order 9 stage pipeline
1 IAG (instruction address generation)
2 IC1 (instruction fetch cycle 1)
3 IC2 (instruction fetch cycle 2)
4 IED (instruction early decode)
5 DS (decode and select)
6 AG (address generation, EA calc, early instruction execution)
7 OC1 (op fetch cycle 1)
8 OC2 (op fetch cycle 2)
9 EX (instruction execution)

There is still an instruction buffer even though the IB stage disappeared and it has been folded into the last instruction fetch cycle like the U74 core. Maybe the simplified ColdFire core reduces the time spent for early decode or maybe optimizations allowed 2 stages of instruction fetch without increasing the pipeline length. Two stages of data fetch does add another stage. This was using a 32kiB instruction and 32kiB data cache circa 2002 with 130nm chip process. The access times for these L1 cache sizes on more modern silicon would improve.

In conclusion, these CISC and RISC pipelines are performing roughly the same stages in the same order. The SiFive U74 core is a very simple in-order core. I expect it is smaller than ARM Cortex-A53 and ARM Cortex-A55 cores which it outperforms even though it can only execute instructions early or late without code fusion/folding but still providing good benefit do to reduced load-to-use stalls and early availability of results from instructions executed in the AG stage. The 68060 and CFV5 have the same advantage but common CISC instructions use both the AG stage and EX stage that are the equivalent of two RISC instructions but already packaged into one instruction. While the 68060 may require additional work predecoding, the U74 core requires additional work fusing/folding instructions together to achieve the performance possible with 68k code. Other than this, the stages are not so different and I see no reason why the max clock speed would be much different. Longer pipelines should be able to clock higher regardless of CISC or RISC. Large and complex OoO CPUs can be an exception because of long lines to units far away that requires dealing with significant clock skew. In-order cores are usually small and simple increasing the likelihood of being able to clock high. Even the large in-order POWER6 (5GHz using 65nm process but 6GHz prototypes were tested) was able to out clock the OoO POWER7 (4.25GHz using 45nm process). The in-order 8 stage 68060 should have been able to reach 150MHz using the 500nm process. Instead, the 4 stage PPC 601 and 603 were given die shrinks and clocked up while the 8 stage 68060 remained at 50MHz. This was a political decision all the way as the 68060 wasn't allowed to make the shallow pipeline PPC CPUs look bad.

Gunnar Quote:

What counts is the number of transistors between register to register.

The 68060 does a lot more work in his pipeline in one instruction than the PowerPC does.

The 68060 pipeline is designed to calculate an EA, do a Cache load, and do a ALU operation on it,
and even write the result back - in a single instruction.

This means its effective doing 3 times the amount of work that the PowerPC pipeline does.
This means it has A LOT more transistors in his pipeline.

If you mind this then it will make sense
that even if the 68060 has more pipeline stages than the 601/603 PowerPC -
as its does 3 times more work in his pipeline ...
the amount of transistors in each stage is NOT much less than the PowerPC step - therefore he can not outclock them.

Makes sense now?

I think you are an imposter of Gunnar von Boehn and likely MEGA_RJ. The whole post above is suspect. The FPU post earlier is also suspect. The Pentium is usually considered to have a fully pipelined FPU even though the FDIV and FSQRT are not fully pipelined which is typical as they often use iterative algorithms and are not common enough to pipeline. You seem to confuse FPU instruction latency with throughput as well. It is single cycle throughput that makes an instruction fully pipelined while FPU instruction latencies are often multicycle. I have pointed out other suspect posts before. When are you going to stop the charade?

Last edited by matthey on 20-Mar-2024 at 09:48 PM.
Last edited by matthey on 20-Mar-2024 at 09:36 PM.

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 29-Mar-2024 9:02:36

[ #1127 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@matthey

Regarding your post that said:

Quote:

the 68060 did had more pipeline stages than early PowerPC
and you assumed that therefore it should have reached higher clock.

Let me help you understand pipelining:
A pipeline step will add a row of internal invisible registers in the CPU that store all CPU information at this point.
With means with each extra pipeline stage the whole CPU content is stored one more time.
This means one pipeline stage will add a LOT of internal invisible registers.

More pipelines stages= a lot more internal registers in the CPU - as all content of the core at this point.

What you need to understand is that for a clock rate it is NOT important how many pipeline stages you have

but how many "logic operations" are done between register to register level.

As lower you get this transistor count between the pipeline registers - as higher you can raise the clockrate.

This is very simple physics.

The PowerPC does do LESS work in its total pipeline then for example the 68060 CPU or than the Coldfire V4.

Both the Motorola 060 and Coldfire V4
- do EA calculation
- Ddache access
- and ALU operation
- write back
in the total pipeline..

The POWERPC does NOT do this !!!
- do EA calculation / same level ALU
- Ddache access / same level write back

You see for the Execution part of the pipeline the PowerPC does a lot less logic combinations.
Also the decoding of the PPC instructions needs less logic operations .. This means it has there also less transistors chained.

This means the logic length of the POWERPC is roughly about "halve" of the 060/Coldfire.
The amount of chained logic operations in the pipeline of the 060 is by design A LOT MORE than the PowerPC.

Lets make a simple math example to make this more clear for you:

Lets say for sake of argument you have 480 total logic combinations in the pipeline of the 060
From top to bottom aand you divide this by 8 == 60 per step
This means [b]60[b] is what matters now for clockrate!

Lets say your "simpler" PowerPC pipeline has only 240 logic combinations from top to bottom
and you divide this by 4 == 60 per step!

This is simple example this should be EASY to understand.
Does this help you, is all clear now?

Last edited by Gunnar on 29-Mar-2024 at 10:01 AM.

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 29-Mar-2024 12:15:49

[ #1128 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12993
From: Norway

@Gunnar

The ARM is interesting in that has 16bit and 32bit length instructions, but have to switch between 16bit and 32bit mode, therefor the length of instruction is always known, beforehand.

So you can have advantage of fitting many instruction in instruction cache, or you can have the fatter 32bit instructions with larger opprand bits.

Last edited by NutsAboutAmiga on 29-Mar-2024 at 12:32 PM.
Last edited by NutsAboutAmiga on 29-Mar-2024 at 12:21 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 29-Mar-2024 13:50:43

[ #1129 ]

Cult Member

Joined: 25-Sep-2022
Posts: 512
From: Unknown

@NutsAboutAmiga

Quote:
The ARM is interesting in that has 16bit and 32bit length instructions,
but have to switch between 16bit and 32bit mode, therefor the length of instruction is always known, beforehand.
So you can have advantage of fitting many instruction in instruction cache, or you can have the fatter 32bit instructions with larger opprand bits

I have many years of ARM coding experience and also CPU internals experience
as I have worked for HUAWEI in design department of new ARM cores
and I also had an Archimedes in the "good old" times.
So I'm pretty experienced coding both types ... the old one and the new 64bit one.

I can share you my personal opinion, which is based on my own experience.

In my very personal opinion : The 68k is a lot better to code.
And I think that the 68K ISA is also the better architecture.

68K instructions set is a lot easier to read and easier to code,
The 68K ISA is in my opinion also a lot stronger
You can with one 68K instruction do a lot more work than ARM.
And if you use the ISA enhancements that the APOLLO 68080 CPU offers then you have in all areas only advantages over the ARM.

* easier to program
* better code density
* you have more register available
* you have both 2 OPP and 3 OPP instructions mode
* you instruction are stronger and can directly work on memory

You only have advantages

Having coded both for many years - I would always choose 68k over ARM.

Status: Offline

vox

Re: some words on senseless attacks on ppc hardware
Posted on 29-Mar-2024 16:23:19

[ #1130 ]

Elite Member

Joined: 12-Jun-2005
Posts: 3957
From: Belgrade, Serbia

@NutsAboutAmiga

Thank you for pointing out to reworked NallePuh - I believe this newer version isnâ€št (or was not at time I was using x1000) on OS4 depot. Such tools should go to original installation.

Anyway, yes OS4 has low backward compatibility for recompiled OS 3.1, due to post forward updates. Surely, it brought OS forward, but little attention was paid to backwards compatibility. MorphOS is example it can be done better.

Surely, Warp3D Nova is advancement towards full OpenGL and 3D.

But from x1000 users perspective, this is how that road went;
- Originally OS 4.2 was prepaid and promised fulll Galliium
- Shipped card and CFE would not even produce video signal on every soft reset
due to x86 BIOS on card and CFE not seeing soft reset (I was told UBoot knew how to handle this and neither SAMs or x5000 have this problem)
- Even basic 2D drivers needed to be optionally bought with x1000 since OS 4.1 beta and FE support drivers were limited
- Then Enhancer 1 neeeded to be bought which is fine, but it did not brought 3D for original cards X1000 was sold with, it needed SI card
- Once you got SI card it worked for Warp3D Nova, but there was no backward W3D compatibility
Also you lost 3D in Linux due to no 3D SI support there
- Then you needed Enhancer 2 for Full Warp3D Nova and even there additional purchase for Warp3D backward compatibility

Yes I know HunoPPC and kas1e work hard to make quality Warp3D Nova ports, as well as that there is software wrapper but generally these are all half balked solutions to older 3D standard. Its not just few games, there were plugins for AmigaAmp and some other stuff that used W3D.

Plus now Warp3D Nova ports are sold as brand new things.

In general, it costed up to several hundred euros in total with P&P and import fees to get OS4 support something that was hardware shipped with basic baseline.

So much of OS4 dream machine as Trevor advertised it back then.

And its example where OS4 shines to MorphOS - most recent drivers are quite good, supporting variety of cards, except CFE will never get updated due to lost sources, and high end models cant be used on x1000.

I same manner, while sound card onboard support came quickly, driver for onboard LAN never materialized, forcing use of older lower bandwith PCI cards to get LAN-WI Fi for no real reason.

Its generally outrageous from users POV.

So hardware is good, it is what it is, way OS4 support is handled is ... mystical and cash milking

RunInUAE was great tool, and it took several years until JIT for x1000 was produceed, which made AGA games runnable on x1000. Before that, OCS only..

Last edited by vox on 29-Mar-2024 at 04:30 PM.
Last edited by vox on 29-Mar-2024 at 04:24 PM.

_________________
OS 3.x AROS and MOS supporter, fi di good, nothing fi di unprofessionalism. Learn it harder way!
SinclairQL and WII U lover :D
YT http://www.youtube.com/user/rasvoja

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 14-Apr-2024 20:12:21

[ #1131 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4431
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@cdimauro

Quote:
There are likely other reasons the 68040 runs hot like less power gating than the 68060.

This was not a topic in the 90th yet.

Quote:
What I recall is that it was using an internal clock with double the nominal frequency.

The 68040@25 and the 68060@50 run at the same clock of 50 Mhz

--

Power Consumption consists by 2 parts.

A) Active power consumption.
The count of "bits" that flip inside the Chip times the Voltage you use.

As more the chip "does" = more bits flip per clock,
as higher the clock = more flips per second
as higher the core voltage
= as more power is consumed.

You can reduce power consumption by making sure you not "flip" bits when doing nothing.
For this you can clockgate unit not used.

B) Passive leakage of power.
Leakage means power you loos even without flipping.
Leakage is more problem that is important for chips of today than of the 80th and 90th.

The Motorola 68040@25 and Motorola 68060@50Mhz both run internally at 50Mhz

040 = 50MHz * 5 Volt = more heat
060 = 50MHz * 3 Volt = less heat

Thanks for confirming it.

This means that 68040s should be compared to 80486s DX2 of effective clock speed.

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 14-Apr-2024 20:16:20

[ #1132 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4431
From: Germany

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@Gunnar

The ARM is interesting in that has 16bit and 32bit length instructions, but have to switch between 16bit and 32bit mode, therefor the length of instruction is always known, beforehand.

That's not correct. Thumb and Thumb-2 have both 16 and 32 bit instructions, which they can freely mix. So, those ISAs are both variable length.

The only thing is that Thumb completely lacks the ARM (native) 32-bit instructions, so if you want to use them then applications using this ISA should switch back and forth between Thumb and ARM mode.

Thumb-2 eliminated this problem, since it integrated the 32-bit instructions (but without conditional execution).

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 14-Apr-2024 20:18:09

[ #1133 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4431
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@NutsAboutAmiga

And if you use the ISA enhancements that the APOLLO 68080 CPU offers then you have in all areas only advantages over the ARM.

* easier to program
* better code density
* you have more register available
* you have both 2 OPP and 3 OPP instructions mode
* you instruction are stronger and can directly work on memory

You only have advantages

The highlighted part isn't proved.

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 4:33:56

[ #1134 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@NutsAboutAmiga

Quote:
Some audio drivers do not support many channels, just 1, or only a few.

The common ones for PCI cards like SB128 do in fact only support one channel. I don't know if it's hardware or driver limitation. But from the time there was a move from hardware channels to software mixing.

Quote:
if thatâ€™s case, you better off, trying to mix the channels yourself, at least then you donâ€™t lock other programs out.

AHI already offers mixing without needing the DIY approach. Even when using the device method it can do this without any lock out even though it's not the recommend approach The lock can be worked around using device audio driver. However, it does reduce quality for some reason and the locking limitation is a software limitation, as AHI internally uses the device itself so it's like a false limitation. In any case, the lock is likely in place because when music is played, it should block out everything else. Just like playing modules on an Amiga.

This is easily demonstrated by playing a music video on YouTube and then starting another.

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 5:22:26

[ #1135 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@cdimauro

Quote:
There's no "internal RISC core" inside x86/x64 processors: that's an urban legend!

On no! What was that big deal about the Pentium all about then? I recall back then what would have been the 80586 became the Pentium and blew the 68K out of the water!

Quote:
If you want to have more insight about this, you can check my series of articles about the everlasting RISCs vs CISCs dispute:

Okay I understand. It's somewhat semantics and using RISC as a simplified term to explain it. Where, as usual it's more technical under the hood, and somewhat misleading using an ISA term to describe internal microops.

I wonder where the 6502 sits? It could be described as RISC with load/store to registers. But works as CISC with memory direct. Also, some people I've spoken too have described it as big endian, even though the data width is only 8 bits. But, addresses are in lo/hi order so I would call it little endian.

I would call the copper true RISC. 32 bit codes. 16 bit data. 3 instructions.

Quote:
Only for SIMD registers. But FPU register's can't be directly mapped. And for (almost) directly mapping the GP registers then you need the future processors which are implementing Intel's APX "extension".

I'd say skip the vectors, leave AltiVecVMX out of it, and use SSE+ to implement the PPC ALU instructions. Or in the least use SSE as a dumping ground to store PPC GPRs in local x64 registers. In the case of FPU, though common, just emulate a P1022 Tabor where it does hybrid SIMD/FPU without having either!

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 6:08:34

[ #1136 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@Gunnar

Quote:
What defines AMIGA? Is it how it was coded? Is it how the Amige inventors teached people how to code it?

To be slightly technical, I would say the Amiga isn't defined by how it was coded or how the inventors taught how to code it, but it is defined by a particular memory map sitting at $DFF000 with 16 bit hardware registers that matched the MC68000 ISA it was mated with.

Quote:
If you read and understand the Amiga hardware reference manuals What is the philosophy that the Amiga engineers teach you?

What they wanted to achieve and how it was implemented in hardware.

Quote:
Read the hardware reference manuals than you will agree to this. Coding the hardware is the Amiga way the Amiga engineers encourage you to code!

Before the hardware and OS were updated which then showed all the problems arisen from banging the hardware without any thought of upward compatibility. For example assuming an Amiga will only have chip RAM. Or assuming the ROM is always exactly the same code. When the OS is going to be taken over there's no point trying to be smart and nit picking direct ROM routines that they like when a boot block already has access to the OS and can call OS functions normally before giving it the boot.

Quote:
Do you not like this? You can like what you want - but this does not change the fact that coding the hardware was always part of Amiga spirit - and encouraged by the fathers of the Amiga.

It doesn't worry me. I've written code that takes over the system and banged hardware. I also had a running system so made sure it didn't kill the OS and made use of the OS to reset the screen before I took over.

Quote:
Why do you want to forbid people from using the Amiga the way the fathers of the Amiga envisioned it?

When did I say I want to forbid people doing that? They can do what they like. But the context of this is not the original Amiga. This relates to expanding the Amiga chipset. Decades later. By taking Paula and expanding her sound capabilities to increase channels and resolution in the Vampire chipset.

So how do you expect any programs, games or software, to use the features of the Vampire or Pamela if they are only coded for an A1200 or A500? Most games targeting ECS or AGA need more than a simple reassemble and need an overhaul on code alone to deal with specific features. And in the case of Pamela it's in an entirely different region of the hardware map of SAGA. Code would need to be rewritten to take advantage of all those 16 channels. OTOH, code using an independent driver, where it can be configured for how many channels it uses needs less changes to code and sometimes just trivial changes.

Quote:
Both was allowed on Amiga, and think people should have the right do code how they like.

Either way, one way tends to lock code to one hardware model and work the best for that model, the other way is better if compatibly is desired and the trade off doesn't impact performance.

Status: Offline

Karlos

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 9:14:51

[ #1137 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4954
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

RISC and CISC are somewhat arbitrary labels when you get right down to it.

RISC was a set of architectural design choices to simplify the processor hardware and export any complexity out to the compiler. Anything that didn't follow this was then retrofitted the CISC moniker as if the entire set of choices for RISC are somehow mutually exclusive with any other. That has never been true in reality.

You can have your cake and eat it. It all depends on how much R&D you can throw at a problem.

_________________
Doing stupid things for fun...

Status: Offline

kolla

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 11:01:50

[ #1138 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3473
From: Trondheim, Norway

@cdimauro

Quote:
This means that 68040s should be compared to 80486s DX2 of effective clock speed.

Many Mac users did this in the â€œ90thâ€ (SIC!) and got ridiculedâ€¦ so by all means, go ahead!

You may find some of the information in this thread enlightening
https://68kmla.org/bb/index.php?threads/multiprocessor-se-30.31979/

Last edited by kolla on 20-Apr-2024 at 11:28 AM.

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Hypex

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 14:42:09

[ #1139 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@Hammer

Quote:
PowerPC Amiga NG has cut off the Amiga Zorro II/III add-on hardware that anchored the Amiga platform in its professional market niche.

This is one of the things that became obvious in the early days. The super Amiga cards could not be plugged in and this hampered it's entry into the Amiga market. However, Zorro would have looked like ISA slots against the current PCI, but some PC boards would have had ISA and PCI. Had they been able to engineer it Zorro slots would have been a good addition though given it was a decade later a Zorro based PCI slot would have made more sense. In the least a Zorro PCI bridge card would have helped.

The only TV video card I recall was some popular BT878 TV card that had TV input. But, were there PCI versions of the Toaster? Sure not the Amiga deal. But being able to run the Toaster software with a PCI Toaster would have helped a great deal. I mean, the Toaster used the Amiga as a dongle, so did the card interface really matter?

Status: Offline

OneTimer1

Re: some words on senseless attacks on ppc hardware
Posted on 20-Apr-2024 16:27:58

[ #1140 ]

Super Member

Joined: 3-Aug-2015
Posts: 1254
From: Germany

@Thread
Quote:

PowerPC Amiga NG has cut off the Amiga Zorro II/III add-on hardware that anchored the Amiga platform in its professional market niche.

I would have said the same in 1995 but when AmigaOnes or Pegasos surfaced, those Zorro2/3 cards where outdated and more expensive than everything for PCI.

Quote:

But, were there PCI versions of the Toaster?

The Toaster was dead in the water when PCI hardware became available, the way video is treated in studios had switched from analog to digital and a fast ethernet card was the best video adapter you could get.

And don't forget, Video Toaster was NTSC only.

Last edited by OneTimer1 on 20-Apr-2024 at 04:31 PM.
Last edited by OneTimer1 on 20-Apr-2024 at 04:31 PM.

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle