Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6071 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

6 crawler(s) on-line.

85 guest(s) on-line.

0 member(s) on-line.

You are an anonymous user.
Register Now!

VooDoo: 12 mins ago

Birbo: 28 mins ago

matthey: 42 mins ago

Gunnar: 1 hr 47 mins ago

DiscreetFX: 1 hr 53 mins ago

Hammer: 2 hrs 16 mins ago

billt: 2 hrs 23 mins ago

agami: 4 hrs 27 mins ago

amigakit: 6 hrs 3 mins ago

OneTimer1: 6 hrs 44 mins ago

Forum Index

Amiga OS4 Hardware

some words on senseless attacks on ppc hardware

Poster

Thread

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 18-Mar-2024 22:31:03

[ #1101 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12825
From: Norway

@vox

Quote:
Does Petunia have penalty to native PPC software? I suppose to some extent.

Well, itâ€™s not responsible for PowerPC software. It can be disabled.
If there is a penalty itâ€™s the memory its consumes.

Quote:
Tried to use NallePuh and CIA Agent to make OS4 more Amiga friendly, but did not help.

In that case your using a old version.

Quote:
Enhancer is not official part of OS4 and best to my knowledge WOS support is not reimplemented. Warp3D support is.

Enhancer contains the RX, HD drivers, the older graphics card and HD has Warp3D, no need for Enhancer to emulate Warp3D, however the one in Enhancer is more compatible.

but it is a useless point when all the old Warp3D games worth playing is being recompiled for AmigaOS4.1 FE Warp3D Nova.

I agree MUI is mess, but as understand it was open sourced, but they have not license to distribute a registered MUI. And you canâ€™t register MUI, as result it looks pretty ugly on AmigaOS4, no styling, textures etc. Why I like reaction better, its offical and supported.

Last edited by NutsAboutAmiga on 18-Mar-2024 at 10:47 PM.
Last edited by NutsAboutAmiga on 18-Mar-2024 at 10:41 PM.
Last edited by NutsAboutAmiga on 18-Mar-2024 at 10:41 PM.
Last edited by NutsAboutAmiga on 18-Mar-2024 at 10:40 PM.
Last edited by NutsAboutAmiga on 18-Mar-2024 at 10:32 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 18-Mar-2024 22:51:09

[ #1102 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2024
From: Kansas

Hammer Quote:

From the Greek meaning "not divisible into smaller parts".

An "atomic" operation is always observed to be done or not done, but never halfway done.

An atomic operation must be performed entirely or not performed at all.

In multi-threaded scenarios, a variable goes from unmutated to mutated directly, with no "halfway mutated" values.

I understood the definition of atomic given above from your original post. The more common technical definition of an atomic operation in recent years is a locked mem access as they say (TAS, CAS, CAS2 instructions on the 68k). However, basic instructions and their operations can be considered atomic instructions and operations. The 68k RMW instructions are atomic operations as used on the 68k Amiga for library open counter incrementing and decrementing but they are not locked atomic operations (x86 allows many mem access instructions to be locked with a prefix which is nice from a programmer perspective but is a hardware design impediment). RISC instructions separate load/store and ALU operations so they need lock acquire instructions (lwarx/stwcx instructions on PPC) and then the condition checked for operation success and a branch if failure. RISC has smaller simpler atomic operations which in the case of PPC requires 4 instructions using 16 bytes of code to replace one 68k instruction using 2 bytes of code.

One of the reasons why the technical definition of an atomic operation usually refers to a locked mem access today is because of multicore systems. The scope of an operation or instruction was originally for a single CPU core and today the scope is often a multicore SMP system. The LLVM compiler recently received "modern" 68k "atomic instruction" support which gives an idea of the perspective.

https://m680x0.github.io/blog/2023/05/may-updates.html Quote:

Atomic Instructions

Atomic instructions are commonly seen in modern architectures to perform indivisible operations. However, historically speaking, atomic instructions have never really been a thing for m68k, since processors in this family are predominantly single-core, which is the model we primarily focus on in this project. That said, as a backend we still need to lower atomic instructions passing from earlier stages in the compilation pipeline. Otherwise, LLVM will simply bail out with a crash.

For atomic load and store, the stories are a lot simpler: due to the aforementioned single-core nature, lowering them to normal MOV instructions should be sufficient, which was something D136525 did. In the same patch, the author, Sheng, also dealt with something more tricky: atomic compare-exchange (cmpxchg) and its friends, like atomic fetch-and-add (or add-and-fetch). Despite being single-core, the processor can still run multi-tasking systems. So we need to make sure an atomic cmpxchg is immune to system routines like interrupts and/or context-switching. To this end, 68020 and later processors are equipped with the CAS instruction, which can be used as the substrate for fetch-and-X instructions, in addition to implementing cmpxchg. For older processors, we expanded these instructions into lock-free library calls (i.e. __sync_val_compare_and_swap and __sync_fetch_*). In addition, this patch also lowered atomic read-modify-write (RMW) and any atomic operations larger than 32 bits into library calls of libatomic, which are not lock-free. Last but not the least, 85b37d0 added the lowering for atomic swap operations.

D146996 was dealing with a similar puzzle: atomic fence. As mentioned before, we donâ€™t need to worry about the memory operation order in a in-order single-core processor, like most members in 68k. Thus, this patch only needs to prevent compiler optimizations from reodering instructions across atomic fence. I believe there is definitely a more sophisticate solution, like adding dependencies (e.g. SelectionDAG chains) between instructions placed before and after the fenceâ€¦but, well, I was lazy so I literally copied what m68k GCC did: lower atomic fence into an inline assembly memory barrier a.k.a asm __volatile__ ("":::"memory") (more precisely, an inline assembly instruction in LLVMâ€™s MachineIR).

That said, if we want to deal with potentially-out-of-order 68060 processors in the future, we might need to lower any fence into a NOP, which has the syntax of synchronizing the pipeline.

Min talks about locked instructions like CAS but for the scope of a single core processor, non-locked CISC mem accesses can often be used like the 68k Amiga used. One thing he misses is the 68k history where the 68k was a leading multiprocessor workstation competitor when most microprocessors did not have this support. Did he know about this paper?

A Lock-Free Multiprocessor OS Kernel
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=3b89809e47ac0c87f224d58e595ccc39dc60b1b8

The paper is good but maybe not too helpful because Motorola CAS2 support was buggy in the 68040 and then dropped on the 68060. CPU32 and ColdFire didn't support CAS or CAS2.

Hammer Quote:

The fixed length has influenced on pipelining i.e. reduced complexity and reaching higher clock speed. Refer to DEC Alpha's RISC concept.

Hammer Quote:

Classic Pentium has a superscalar pipeline design.

68060's FPU wasn't pipelined.

From Stanford University

Why is RISC better for pipelining?

Pipelining
Because RISC instructions are simpler than those used in pre-RISC processors (now called CISC, or Complex Instruction Set Computer), they are more conducive to pipelining. While CISC instructions varied in length, RISC instructions are all the same length and can be fetched in a single operation.

A fixed length encoding takes a roughly fixed amount of time to fetch and decode instructions which is easier to pipeline. Variable length encodings take a variable amount of time to fetch and decode which is more difficult to pipeline efficiently as bubbles will enter the pipeline when instructions couldn't be fetched or decoded in time. Large instruction fetch sizes were initially used to minimize bubbles from large instructions but this increased power used and heat production. This was a problem for the 68040 which runs hot and some common instructions still take longer than a cycle to execute. The solution is of course decoupling with an instruction buffer the instruction fetch pipeline (IFP) from the operand execution pipelines (OEPs) as the 68060 uses. The IFP can have a smaller and more power efficient fetch that takes multiple cycles to decode large instructions while the OEPs can execute instructions as long as the instruction buffer is not empty. If the OEPs stall, the IFP continues to fill the instruction buffer. The IFP and OEP pipelines are like separate units that are pipelined separately and can operate in parallel except when the instruction buffer is empty when they act like one continuous pipeline. Of course it is possible to have multiple parallel pipelines like OoO CPU cores and 68k FPUs where instructions execute in parallel. Pipelining uses more transistors and so does that decoupling instruction buffer. The funny thing is that most superscalar high performance RISC cores use a decoupled instruction buffer now too. Superscalar cores can execute a variable number of instructions which makes them more difficult to pipeline and the decoupled pipelines with instruction fetch buffer even out the inconsistency and allow a smaller and more efficient instruction fetch, even for in-order RISC cores using a fixed instruction length. There are a few other advantages like branch prediction optimizations in the IFP if the OEPs stall.

The 68040 was a good example of a CISC CPU core that was difficult to pipeline. The 68060 was a good example of a futuristic design for CISC and RISC cores instead. There was no more variable length encoding pipelining problem in the 68060 which had so many design elements of future cores but there was no more market for a relatively simple in-order core that outperformed more complex OoO (PPC) cores except for the embedded market where high clocks speeds were not needed. The 68060 with deep for the time 8 stage pipeline was only ever clocked to 50MHz which is a shame or PPC sham. The 68060 should have been out clocking 3-6 stage pipeline CPUs of the day, including most simple RISC cores, further increasing its integer performance advantage and reducing any FPU performance disadvantage of its minimalist FPU. There was plenty of room for improvements too as the 68060 is a relatively simple power efficient first design.

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 3:12:49

[ #1103 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@matthey

Quote:
. Large instruction fetch sizes were initially used to minimize bubbles from large instructions but this increased power used and heat production. This was a problem for the 68040 which runs hot and some common instructions still take longer than a cycle to execute.

You get a bonus point for this creative answer.

But the real reason why the first 040 ran hot, while the first 060 ran cooler is very simple the
production process. The first 040 were done in the old 5V process which gets hotter - while the 060 used the new 3Volt process which is cooler - So

No.

The reason the 68040 runs hot is simply that it was produced in an old sillicon process still using 5 V technology.

Motorola later made the 68040V using 3.3 Volt technology -
and this one runs cold like an icecube.

Last edited by Gunnar on 19-Mar-2024 at 04:03 PM.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 4:44:10

[ #1104 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@Gunnar

Quote:

Gunnar wrote:
@matthey

Quote:
. Large instruction fetch sizes were initially used to minimize bubbles from large instructions but this increased power used and heat production. This was a problem for the 68040 which runs hot and some common instructions still take longer than a cycle to execute.

No.

The reason the 68040 runs hot is simply that it was produced in an old sillicon process still using 5 V technology.

Motorola later made the 68040V using 3.3 Volt technology -
and this one runs cold like an icecube.

68040V is missing the FPU and it's functionally 3.3V 68LC040. The main reason why I ditched my 68LC060 for the full 68060 is due to Pentium-era PC game ports and 68060 Amiga demos needing a fast FPU.

LC060 is treated like a fast 030 CPU.

Last edited by Hammer on 19-Mar-2024 at 04:49 AM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 5:06:05

[ #1105 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@Gunnar

Quote:

Gunnar wrote:
@Hammer

Quote:
Pentium IV Wilamette has a fixed-length micro-OP and 20-stage long pipeline.
For apples to apples using the same Motorola's 130 nm SOI process tech, PowerPC 7447 @ 1.33 Ghz was fab'ed on Motorola's 130 nm SOI process.
PowerPC 7447A reached 1.67 Ghz.
AMD licensed Motorola's 130 nm SOI process tech for Athlon 64 CPUs and its fastest SKU was 2.4 Ghz Athlon 64 4000+.

That "fixed length" instructions are needed for high clock rate is a MYTH

High clock rate does depends only on pipeline length and process used.

Prove a non-fix length uop-enabled CPU product with 3 Ghz.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

matthey

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 5:33:38

[ #1106 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2024
From: Kansas

Gunnar Quote:

No.

The reason the 68040 runs hot is simply that it was produced in an old sillicon process still using 5 V technology.

Motorola later made the 68040V using 3.3 Volt technology -
and this one runs cold like an icecube.

Moving to 3.3V and a 500nm fab process for the 68040V was of course a major reason it runs much cooler. This is a similar chip fab process to the 68060 while most 68040s are 5V and 800nm with later chips using a 650nm process. The 68040V was certainly much cooler using 1.5W@33MHz but was significantly more expensive and came later (many customers chose to wait for the higher performance 68060 in development). Most 68k Amiga users have never heard of a 68040V which is an indicator that it was for specialized applications and not for mainstream use. There were some design changes for the 68040V like a move to a fully static design but unfortunately no decoupled IFP and OEPs likely because it would have been a major design change. The later early ColdFire XCF5102 CPU has what looks like a decoupled 2 stage IFP and 4 stage OEP pipeline (6 stages like the 68040 but decoupled like the 68060). Early versions of some XCF5102 CPU chips were of course labeled MC68040VL so we know where Motorola was headed with the 68040 as they tried to reduce power, sacrificing valuable 68k compatibility in the process.

https://www.cpushack.com/2019/11/01/cpu-of-the-day-motorola-mc68040vl/

Instruction fetch uses a considerable amount of power for small CPU cores (studies: instruction cache and fetch used 27% of power on a StrongARM processor, instruction fetch and decode can consume more than 40% of power). CISC high performance cores had to have a large fetch to have a chance at efficiently pipelining larger variable length instructions without a lot of bubbles in the pipeline. Any instruction that can't be fetched in a single cycle can't be fed into the pipeline in a single cycle thus creating a bubble that reduces performance. This was unfortunate as variable length instructions save instruction caches reducing power. The decoupled IFP and OEPs give variable length encodings all the advantages of code compression without requiring large fetches. The transistors for an instruction buffer are more than offset by the savings of transistors for large instruction caches. The 68060 performance with just a 4 byte/cycle fetch is evidence of the effectiveness of the decoupled IFP and OEPs. Sure, an 8 byte/cycle fetch would likely be necessary for a significantly higher performance enhanced 68060 but the 4 byte/cycle fetch was obviously not as much of a bottleneck as some people claim. The 68060 instruction fetch was smaller than practically all superscalar CPUs at the time saving power. One of the best code densities further aided instruction cache efficiency. The 68060 was a great balanced processor design with both good performance and power efficiency.

There are likely other reasons the 68040 runs hot like less power gating than the 68060. The 68040 design may have targeted high performance but delivered as much heat as performance. Further, design, development and production delays set the 68k roadmap back years and likely led to the downfall of the 68k. Reducing 68040 power would have likely required planning ahead for a combination of lowering the voltage (likely using a more expensive fab process), more power gating and the decoupled IFP and OEP. The 68060 received all these and was the Pentium killer Motorola needed before Apple moved to PPC and C= and Atari disappeared.

Last edited by matthey on 19-Mar-2024 at 05:37 AM.

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 5:55:34

[ #1107 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:
Intel has published direct SSE versus AVX comparisons with benchmarks run on the same second-generation Core i7 processor with FFT performance improvements ranging from 1.2 to 1.8x.

Intel Core i3 N series 100% Gracemont E-Cores don't have native 256-bit hardware units.

Intel's Thread Director is a workaround to minimize E-Core's involvement with modern PC games.

And... WHO CARES?!?
Quote:
IBM POWER8's VSR has 64 128-bit SIMD registers which have a total of 8,192 bits at the front end. POWER8 was launched in 2014.

POWER8 8 E4E5 's six cores (up to 48 SMT) @ 4.1 Ghz are relatively cheap, but the problem is the low-cost motherboard with 1 PCIe 16X slot and 2 to 3 PCIe 1X slots.
Should have open-source POWER8 motherboard design for unrestricted clones.

POWER8 can be configurated with SMT8, SMT4, SMT2, SMT1 (ST) modes.

Per POWER8 has 16 execution units:
2 Fixed point units.
2 Load store units (can also execute simple fixed-point operations).
2 Load units (can also execute simple fixed-point operations).
4 Double precision floating point (can act as eight single-precision pipelines).
2 Vector unit 128-bit VMX/AltiVec.
1 Crypto (AES).
1 Branch.
1 Condition register.
1 Decimal floating point unit.
From https://www.7-cpu.com/cpu/Power8.html

Intel SkyLake X was launched in 2017.
Intel Haswell was launched in 2023.

From the front-end instruction set, IBM POWER8 (8,192 bits) has the edge over Intel's Haswell's 16 registers 256-bit AVX2 (4,096 bits), but Haswell has two FMA3 256-bit AVX units.

ROFL. You're counting the bits used for all SIMD registers and using the totals to compare two architectures.

So, you just count them and completely ignore how they are used. STRA-LOL.

What do you think, that the bits used on registers are like the potatoes sold in the market after the corner? The more that you've, the better it is?

This is THE clear measure of how you do NOT understand, at all, architectures.

I add nothing else, because enough is enough.
Quote:
Intel Haswell's implementation has the following vector execution engines
Port 0, 256-bit FMA with FBlend, 256-bit VMUL/VShift
Port 1, 256-Bit FMA with FADD, 256-Bit VALU with FBlend
Port 5, 256-Bit FShuffle with FBlend, 256-bit VALU VShuffle

Intel Haswell also has HD 4000 IGP with up to 332.8 GFLOPS FP32.

The instruction set may not reflect the actual hardware implementation.

Thanks for your meaningless PADDING: I was missing it...

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 6:05:22

[ #1108 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:

Those are ALL the references about AVX-128, AVX-256, AVX128 and AVX256 which are found in that manual and all of them belong to the MICROARCHITECTURE, as it's clearly reported even in the manual by USING EXACTLY THIS TERM!

So, we're NOT in the ISA = Instruction Set Architecture = Architecture domain! That's the MICROARCHITECTURE domain!

You continue to mix both things because you do NOT UNDERSTAND, at all, such technical things. Which are clearly outside of your limited capacities. You're HOPELESS!

For transparency, my argument is MICROARCHITECTURE.

No, your argument is CONFUSION, since you talk about microarchitectures when the discussion was about architectures.
Quote:
Your instruction set focus hides the actual MICROARCHITECTURE implementation!

OBVIOUS, new Messier de La Palice: because those are TWO DIFFERENT THINGS!

As I've said, I do NOT mix APPLES and ORANGES.
Quote:
When I used AMD Jaguar's example, I was referring to the MICROARCHITECTURE implementation

And WHO CARES? The discussion was another one.
Quote:
to debunk your stupid "256-bit AVX" instruction set marketing.

STRA-LOL. You don't miss opportunities to show much ignorant you are.

AVX WAS, IS and WILL BE a 256-bit ISA (extension). That's NOT marketing: THAT'S HOW THIS EXTENSION IS! Since DAY ZERO!
Quote:
I don't support 256-bit AVX instruction set marketing when its corresponding 256-bit hardware is NOT guaranteed by Intel e.g. Gracemont MICROARCHITECTURE. AMD has abandoned 128-bit SIMD-equipped CPUs.

Then tell me more about Skylake-X: do you know that its AVX-512 implementation is NOT fully 512 bits?

And what about AMD's processors? Have you skipped Zen 1 & 2, since their AVX implementation wasn't fully 256 bits?

What about AMD's Zen 4, since its AVX-512 implementation is NOT fully 512 bits?

I assume that you have NOT bought ANY of those processors, right? Because you're a coherent guy and you:

"don't support XYZ-bit AVX* instruction set marketing when its corresponding XYZ-bit hardware is NOT guaranteed"

Could you please confirm my above question? Have you bought one of those processors?

BTW, YOU've said that Pentium III and IV had 64-bit SIMD implementations. I've asked you if you're sure that they were only 64-bit, but I'm still missing an answer.
Do you know the internals of those processors? Have you ever took a look at them? Since you seem to be all about microarchitectures and you were so sure when you were talking about those processors, it should be really easy for you to give an answer, right?
I'm waiting here, mister "Atomic"...
Quote:
You can jump up and down about AVX's "256-bit" superiority

I reveal you a secret: AVX is 256-bit since it was first designed...
Quote:
and it wouldn't change the fact that Intel regressed into a 128-bit hardware implementation.

It doesn't change the fact that it's totally irrelevant.
Quote:
---------

From the surface, IBM POWER8's 64 128-bit VSR register's total size is 8,192 bits which is superior when compared to Intel's 16 registers 256-bit AVX/AVX2's 4,096 bits.

RI-LOL: again, you're measuring and comparing two architectures by just taking the number of bits of their registers.

Your ignorance has no limits...
Quote:
Did you forget IBM's POWER8 VSR improvements?

RI-RI-LOL: I was the one which FIRST reported its existence in this discussion!

And YOU've quoted a comment of mine where I've already given full details about this ISA SIMD extension as well as shown its details: https://amigaworld.net/modules/newbb/viewtopic.php?mode=viewtopic&topic_id=45085&forum=33&start=1060&viewmode=flat&order=0#869150

That wasn't even one day before! But you completely "forgot" it and now you're finger pointing me for the same.
Quote:
https://www.ibm.com/support/pages/vectorizing-fun-and-performance
POWER8 has 64 "vector-scalar registers' (VSRs), the first 32 of which are shared space with the 32 floating point registers. Each VSR is 128-bit.

For vector data register storage at the instruction set architecture level, POWER8 was superior until SkyLake X's AVX-512 release during the year 2017.

RI-RI-RI-ROL: you continue to compare architectures just counting the total number of bits of SIMD registers.

Potatoes! Potatoes! Potatoes! I'm the king of potatoes because I've the biggest amount of them!
Quote:
https://www.ibm.com/support/pages/vectorizing-fun-and-performance
Instruction set architecture with performance debate is nearly meaningless when MICROARCHITECTURE's design is factored in.

Of course, new Messier de La Palice: it's because ISA has NOTHING to do when talking about performances. GENIOUS!
Quote:
Hammer wrote:
@cdimauro

Quote:

Frontend = MICROarchitecture.

The front end contains the logical instruction set architecture's register set enforcement e.g. 16 register programming model behavior.

"logical instruction set architecture"?!? Is it another completely invented non-sense just to show how much ignorant you are since you don't know the correct wording found on literature?

And what about the "enforcement"? Please, clarify, so that I can continue laugh like a laughing hyena...
Quote:
The programmer doesn't have access to the larger register count from register renaming hardware.

Specify: YOU don't know how to do it.

Despite being part of the microarchitecture, this information is usually provided by the processor vendor. And if not, a good estimate is possible (sites like Anandtech have applications for calculating it).

Contrary to the common belief, programmers CAN take advantage of microarchitectures. There's PLENTY information which is found on... rolling drum... the Optimization manuals. Information that is used by compilers, for example, to generate better optimized code for a specific microarchitecture (or a family of them).
Quote:
Quote:

You continue to have no clue at all of the topic, since you continue to completely mix (and do NOT understand) ISA = Instruction Set Architecture and Microarchitectures.

Again, your 256-bit AVX instruction set argument is useless

I agree here: it's useless, because the AVX ISA is 256-bit since the beginning and even the stones know it. The stones, but NOT you.
Quote:
when the current Intel administration doesn't even guarantee corresponding 256-bit hardware across its latest CPU products.

See above: WHO CARES?!?
Quote:
Stop drinking the marketing cool-aid.

Why should I stop? You're making my day every day.

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 6:06:35

[ #1109 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Hammer

Quote:

Hammer wrote:
@Gunnar

Quote:

Pipelining is the term for "dividing" the amount of work an instruction does into smaller step.
As smaller the step you make, as higher can your clock be.

That's one aspect of it. Zen 4C's compact size has lowered its high clock speed potential when compared to the larger area size Zen 4.

Both Zen 4C and Zen 4 have the same pipeline stages.

My secondary CCDs for my Ryzen 9 7950X and Ryzen 9 7900X have lower clock speeds when compared to the 1st "good" CCD silicon.

Hey, have you bought processors that have AVX-512 but implemented only with 256 bits? Really? "Coherent"...
Quote:
Quote:

Every CPU can do pipelining - both RISC and CISC do pipelining.

Classic Pentium has a superscalar pipeline design.

68060's FPU wasn't pipelined.

What a mess. You jump from pipelining to superscalar pipelines with nonchalance: they are the same things to you. Just to show, again, how much ignorant you are.

BTW, even the 6502 was pipelined...
Quote:
From Stanford University

Why is RISC better for pipelining?

Pipelining

That's the old RISC propaganda that they continue to spread around even when this doesn't apply since several decades. See below.
Quote:

Because RISC instructions are simpler than those used in pre-RISC processors (now called CISC, or Complex Instruction Set Computer), they are more conducive to pipelining.

Please, tell me more about this "RISC" instruction: https://keleshev.com/ldm-my-favorite-arm-instruction/

Or a more modern, very fresh, one: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions

Do they fit the "RISC" definition?
Quote:
While CISC instructions varied in length, RISC instructions are all the same length and can be fetched in a single operation.

Then what about this: https://en.wikipedia.org/wiki/ARM_architecture_family#Thumb-2

Or a more modern, very fresh, one: https://drive.google.com/file/d/1zmWDmfbtVY9I6hn0vuLTbk5rsSPc44sL/view

Do they fit the "RISC" definition?

That's all. As usual, you just search around things and copy them here without having knowledge about them. Basically, you're a BOT.

Status: Offline

cdimauro

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 6:08:06

[ #1110 ]

Elite Member

Joined: 29-Oct-2012
Posts: 3650
From: Germany

@Gunnar

Quote:

Gunnar wrote:
@cdimauro

Quote:
The first edition came on 1985, right? When both machines weren't released.

Correct the first edition did came out with the A1000 and before the A500,

This book was the bible that all Amiga developers did read.

Every Amiga coder that wrote games for or demos did read this book.

Basically all the developers that wrote games for Amiga,

games like MENACE, HYBRIS, TURRICAN, GIANA SISTERS ..

they all learn how to code the Amiga hardware using this book.

And this book teaches pretty well how to code the hardware directly.

Almost fully agree. It could have been 100% by changing "Amiga developers" with "Amiga game developers".
Quote:

Gunnar wrote:
@cdimauro

Quote:
So, you COMPLETELY LOST the context in this part of the discussion. You're insane! And hopeless!

Even if Hammer mixed up some stuff.
Why do you need to get personal again?

Again? You should pay more attention on how the discussion evolves.

All happened AFTER this: https://amigaworld.net/modules/newbb/viewtopic.php?mode=viewtopic&topic_id=45085&forum=33&start=1060&viewmode=flat&order=0#869177

Specifically, this:

Idiot.

So, after that he's getting what he deserves...

Now, work time...

Status: Offline

vox

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 17:41:10

[ #1111 ]

Elite Member

Joined: 12-Jun-2005
Posts: 3737
From: Belgrade, Serbia

@NutsAboutAmiga

At time few years ago, both CIA and NallePuh were latest, developed by Hypex I believe.
They alleviate the problem a bit, but much software simply does not work out of box.
Even OS4 native software like Timberwolf tended to case to work on post OS 4.1FE

Enhancer is really nice (as seen now with A600GS OS) but cost of OS 4.1 beta paid with x1000, OS 4.2 prepaid, RadeonHD driver at that time separate purchase and later two Enhancer editions
and separate W3D backward compatibility driver make it nearly insane to get things working.

On positive side, why not licensing SAGA chipset in FPGA on some future NG board and make AmigaNG compatibile with Amiga as we know? Not just by look and feel, but really.

Registered and full MUI4 surely can be licensed same way MOS team did it.

Some MorphOS components and drivers - like x5000 and SAM460 support (as well as Mac G3,G4,G5) support could be also traded for RadeonHD drivers.

More common goals, less duplication of work, one AmigaOS 5 preferably would bring much joy. Both high end 68k RTG and PPC if possible :D

_________________
Future Acube and MOS supporter, fi di good, nothing fi di unprofessionals. Learn it harder way!

Status: Offline

Gunnar

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 18:23:43

[ #1112 ]

Regular Member

Joined: 25-Sep-2022
Posts: 478
From: Unknown

@matthey

Quote:
The 68060 received all these and was the Pentium killer Motorola needed before Apple moved to PPC and C= and Atari disappeared.

You say "BEFORE" Apple moved to PPC,
but its not the reality that Apple moved to PPC long before the 68060 did come out?

Already some month before the 060 come out
Apple already sold complete Systems with PowerPC MAC CPU.
And Apple,Motorola and IBM made their PowerPC Aliance long time before this.

The 68K line - is a wonderful CPU.
But the big problem for Motorola was that developing
and testing a new complex 68K CPU took Motorola to many years to stay competitive.

Motorola understood this problem and worked already since 1987 on RISC alternatives.
As Motorola understood that making RISC chips was cheaper and quicker for them.
Motorola did develop their 88K RISC CPU in 1988.

The 68060 is a very good CPU.

The truth is that during the development of the 68060 it was already clear
that neither Apple will use it anymore, nor any of the big Unix customers that Motorola once had.

We have to mind that the development of the 68060 was done under time pressure and under budget limitations. So a number of compromises needed to be done.
For example the development team of the 68060 wanted to double its Fetch rate to 8byte/cycle.
But Motorola did not approve the budget for this update.

Last edited by Gunnar on 19-Mar-2024 at 06:33 PM.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 21:31:42

[ #1113 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@cdimauro

You claimed Intel's AVX beats PPC's VMX/Altivec and this has a performance criteria.

IBM's POWER8 has 64 128-bit VSR SIMD instruction set with an 8,192 bits total.

Intel's current client Raptor Lake and Meteor Lake SKUs are stuck in 16 register AVX2 with 4,096 bits total.

Intel's AVX10.1 non-Xeon SKUs are stuck in 32 registers 256-bit AVX10 SIMD instruction set with 8,192 bits total fast SRAM data storage.

Intel's AVX10.x is not yet released until Granite Rapids.

Intel does NOT guarantee 256-bit SIMD hardware for their Core i3 N series Gracemont and P-Core count greater than 8 across all Raptor Lake SKUs. Intel is selling half-baked AVX2 hardware.

You have shown your ignorance. Look in the mirror.

With PCIe 4.0 support and greater than 4,096 bits vector register storage criteria, I can purchase a reasonably low-cost IBM POWER9 CPU. The problem is platform cost.

I'm not buying yet another Intel SkyLake-X with struggling 4Ghz and aging PCIe 3.0 I/O.

Since Zen 2's release, AMD has abandoned 128-bit SIMD-equipped Ryzen Zen 1, Jaguar/Puma on the desktop client-side SKUs.

Last edited by Hammer on 19-Mar-2024 at 10:14 PM.
Last edited by Hammer on 19-Mar-2024 at 10:06 PM.
Last edited by Hammer on 19-Mar-2024 at 10:05 PM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 22:09:36

[ #1114 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12825
From: Norway

@vox

>At time few years ago, both CIA and NallePuh were latest, developed by Hypex I believe.

Wrong.. "Stephan Rupprecht", but lot has happed after that.

https://github.com/khval/NallePuh/commits/main/

Hypex has contributed but is not a main contributor.

Last edited by NutsAboutAmiga on 19-Mar-2024 at 10:18 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 22:29:28

[ #1115 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@cdimauro

Quote:

Hey, have you bought processors that have AVX-512 but implemented only with 256 bits? Really? "Coherent

1. Refer to AM5's road map. Zen 5 is a drop-in replacement for AM5.

LGA 1700 is a dead end.

LGA 1700 lacks AM5's two discrete NVMe PCIe 5.0 4X lanes e.g. ASUS ROG Strix X650-E.

2. On most Intel CPUs with AVX-512 support, there are 2 classes of 512-bit instructions: instructions executed by combining a pair of 256-bit units, hence having an equal throughput for 512-bit instructions and 256-bit instructions, and the second class of instructions, which are executed by combining a pair of 256-bit execution units and also by extending to 512 bits and another 256-bit execution unit.

For the second class of instructions, the Intel CPUs have a throughput of two 512-bit instructions per cycle vs. three 256-bit instructions per cycle.

Compared to the cheaper models of Intel CPUs, Zen 4, while having the same throughput as Zen 3, i.e. two 512-bit instructions per cycle vs. four 256-bit instructions per cycle in Zen 3, either matches or exceeds the throughput of the Intel CPUs with AVX-512.

Compared to the Intel CPUs with AVX-512, Zen 4 allows 1 FMA + 1 FADD, while on the Intel CPUs only 1 FMA per cycle can be executed.

The only important advantage of Intel appears in the most expensive models of the server and workstation CPUs, i.e. in most Xeon Gold, all Xeon Platinum, and all of the Xeon W models that have AVX-512 support.

In these more expensive models, there is a second 512-bit FMA unit, which enables a double FMA throughput compared to Zen 4. These models with double FMA throughput are also helped by a double throughput for the loads from the L1 cache, which is matched to the FMA throughput.

The AVX-512 implementation in Zen 4 is superior to that in the cheaper CPUs like Tiger Lake, even without taking into account the few new execution units added in Zen 4, like the 512-bit shuffle unit.

Only the Xeon Platinum and the likes of the Sapphire Rapids will have a greater throughput for the floating-point operations than Zen 4, but they will also have a significantly lower all-clock frequency (due to the inferior manufacturing process), so the higher throughput per clock cycle is not certain to overcome the deficit in clock frequency.

Intel Sapphire Rapids is not ideal for a low thread count Adobe content creation suite nor in low latency/very high clock speed PC games. Intel Sapphire Rapids wouldn't beat AMD's Ryzen 7 7800X3D in PC games.

----
Back in the real world:

For the money vs performance and pure performance, raytracing is better on hardware accelerated with RTX ADA GPUs and perhaps on USD $999 RX 7900 XTX. The Amiga's custom ASIC hardware acceleration spirit is alive with NVIDIA's and AMD's hardware-accelerated raytracing-capable GpGPUs. The large bulk AI workload is dominated by NVIDIA.

The use case for server X86-64 CPUs is FP64, FP80, larger scale remote desktops/virtual machines, higher PCIe lanes, OS host, hypervisor host, large scale databases, and non-GPU accelerated programs.

There are NVIDIA and AMD server-only GpGPUs with good FP64.

For large bulk AI processing contract work, I rather use my RTX 4090, RTX 4080, and two RTX 3080 Ti GPUs.

Quote:

What a mess. You jump from pipelining to superscalar pipelines with nonchalance: they are the same things to you. Just to show, again, how much ignorant you are.

BTW, even the 6502 was pipelined...

Where's 65K's 3 Ghz clock speed implementation?

Where's 65K's at least 1 IPC throughput?

Where's 65K's 32-bit implementation?

Where's 65K's 64-bit implementation?

For ASIC implementation, the 65xx CPU family is a dead end. ARM road map replaced Commodore's 65xx crap R&D road map. Western Design Center (WDC)'s stalled around 16-bit 65C816. 65C832 wasn't released.

In the real world, Acorn's ARM CPUs replaced the crap 65xx R&D road map from Commodore and Western Design Center (WDC).

Last edited by Hammer on 19-Mar-2024 at 11:13 PM.
Last edited by Hammer on 19-Mar-2024 at 11:12 PM.
Last edited by Hammer on 19-Mar-2024 at 11:05 PM.
Last edited by Hammer on 19-Mar-2024 at 10:56 PM.
Last edited by Hammer on 19-Mar-2024 at 10:36 PM.
Last edited by Hammer on 19-Mar-2024 at 10:35 PM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

OneTimer1

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 22:52:38

[ #1116 ]

Cult Member

Joined: 3-Aug-2015
Posts: 984
From: Unknown

@vox

Quote:

vox wrote:

Instead of Xena some FPGA OCS/AGA implementation should be onboard.

UAE for AOS4 was ignored by Hyperion, they had the right to bundle AOS3.x with AOS4 but they refused to do it. UAE or an ECS/AGA FPGA would have turned the A1X1K into something with real Amiga legacy, but they refused to.

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 23:15:52

[ #1117 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@OneTimer1

Quote:

OneTimer1 wrote:
@vox

Quote:

vox wrote:

Instead of Xena some FPGA OCS/AGA implementation should be onboard.

UAE for AOS4 was ignored by Hyperion, they had the right to bundle AOS3.x with AOS4 but they refused to do it. UAE or an ECS/AGA FPGA would have turned the A1X1K into something with real Amiga legacy, but they refused to.

AmigaOne X1000 is A-Eon Technology's toy.

The decision to form a partnership with Varisys had the consequence of bringing XMOS chips to the AmigaOne X1000.

For the AmigaOne X1000 project, Varisys is the mainboard system integrator.

Last edited by Hammer on 19-Mar-2024 at 11:17 PM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 23:22:51

[ #1118 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@cdimauro

Quote:
ROFL. You're counting the bits used for all SIMD registers and using the totals to compare two architectures.

So, you just count them and completely ignore how they are used. STRA-LOL.

What do you think, that the bits used on registers are like the potatoes sold in the market after the corner? The more that you've, the better it is?

This is THE clear measure of how you do NOT understand, at all, architectures.

I add nothing else, because enough is enough.

What matters in the real world is performance.

https://www.youtube.com/watch?v=yTMRGERZrQE
Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?

Jim Keller >>>>>> YOU.

Last edited by Hammer on 19-Mar-2024 at 11:27 PM.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

Hammer

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 23:37:57

[ #1119 ]

Elite Member

Joined: 9-Mar-2003
Posts: 5312
From: Australia

@cdimauro

Quote:

what about this: https://en.wikipedia.org/wiki/ARM_architecture_family#Thumb-2

Thumb-2 technology is available in the ARMv6T2 and later architectures.

Thumb-2 are still fixed length instructions in shorter 16 bits.

For ARMv6T2 and beyond, the programmer can select two fixed-length instruction sets.

Quote:

What a mess. You jump from pipelining to superscalar pipelines with nonchalance: they are the same things to you. Just to show, again, how much ignorant you are.

68060 has a low clock speed, idiot.

_________________
Ryzen 9 7900X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB
Amiga 1200 (Rev 1D1, KS 3.2, PiStorm32lite/RPi 4B 4GB/Emu68)
Amiga 500 (Rev 6A, KS 3.2, PiStorm/RPi 3a/Emu68)

Status: Offline

NutsAboutAmiga

Re: some words on senseless attacks on ppc hardware
Posted on 19-Mar-2024 23:53:32

[ #1120 ]

Elite Member

Joined: 9-Jun-2004
Posts: 12825
From: Norway

@OneTimer1

EUAE was written by Richard Drummond 2003-2007, its not a Hyperion project.
not sure why he stopped working on it, but he was not focused on AmigaOS specifically. He was person focusing on Big endian support for Linux, MacOSX and AmigaOS4.

https://www.rcdrummond.net/uae/

Of course, there has been contributions to EUAE after that by others.

Last edited by NutsAboutAmiga on 19-Mar-2024 at 11:55 PM.

_________________
http://lifeofliveforit.blogspot.no/
Facebook::LiveForIt Software for AmigaOS

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle