Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6225 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

agami

You are an anonymous user.
Register Now!

agami: 25 secs ago

matthey: 28 mins ago

Panabudo: 1 hr 58 mins ago

klx300r: 2 hrs 6 mins ago

mbrantley: 2 hrs 38 mins ago

RobertB: 3 hrs 28 mins ago

amigakit: 4 hrs 4 mins ago

minator: 4 hrs 16 mins ago

Rob: 4 hrs 31 mins ago

dalek: 4 hrs 32 mins ago

Forum Index

Amiga OS4 Hardware

Trevor's Amiga Blog

Poster

Thread

redfox

Re: Trevor's Amiga Blog
Posted on 21-Apr-2023 23:16:44

[ #1541 ]

Elite Member

Joined: 7-Mar-2003
Posts: 2100
From: Canada

@number6

Thanks for the link to Trevor's blog.

I hope we get some A1222+ news soon.

---
redfox

Status: Offline

dirkzwager

Re: Trevor's Amiga Blog
Posted on 23-Apr-2023 11:39:26

[ #1542 ]

Regular Member

Joined: 9-Aug-2019
Posts: 129
From: Belgium, LImburg, Bilzen

@redfox

I hope ithat for so Many years. I have now My sam460 for 3 years.

_________________
Amiga 500+,
Pi 3b+ and Amiga os 3.10
Amikit XE usb Stick
Powerbook 17" and MorphOS 3.13
PowerMac G5 2.3 MP and MorphOS 3.13
Sam 460 with Amiga os 4.1 and checkmate 1500
www.sitedesign.be

Status: Offline

amigakit

Re: Trevor's Amiga Blog
Posted on 5-Jun-2024 22:58:08

[ #1543 ]

Amiga Kit

Joined: 28-Jun-2004
Posts: 2670
From: www.amigakit.com

A new blog entry has been added today entitled "Mid-winter solstice"

_________________
Amiga Kit Amiga Store
Links: www.amigakit.com | New Products | A600GS

Status: Offline

Hammer

Re: Trevor's Amiga Blog
Posted on 6-Jun-2024 3:27:10

[ #1544 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6505
From: Australia

@NutsAboutAmiga

SAM460 seems to have less technical drama when compared to Artica S AmigaOnes and a certain S54 with a nonstandard PPC FPU.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

kolla

Re: Trevor's Amiga Blog
Posted on 6-Jun-2024 7:20:54

[ #1545 ]

Elite Member

Joined: 20-Aug-2003
Posts: 3475
From: Trondheim, Norway

@amigakit

Why does Trevor write that the screenshot is of sysinfo under Amibench when it clearly is kickstart 1.x components that are listed? Why isnâ€™t it showing sysinfo under Amibench? Why is it so hard to get things right? Why, mummy, why why why?

_________________
B5D6A1D019D5D45BCC56F4782AC220D8B3E2A6CC

Status: Offline

Cheese

Re: Trevor's Amiga Blog
Posted on 6-Jun-2024 19:18:20

[ #1546 ]

Regular Member

Joined: 23-Oct-2006
Posts: 315
From: Unknown

@kolla

_________________
x86/MorphOS 4.0

"Delving into the past can be a dangerous exercise." -hyperionmp

"I've been a supporter of "REACTION" GUI because is an Amiga OS thing." -Snuffy

"I personally prefer a vision of do'ers and makers rather than

Status: Offline

matthey

Re: Trevor's Amiga Blog
Posted on 7-Jun-2024 3:23:37

[ #1547 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2754
From: Kansas

amigakit Quote:

A new blog entry has been added today entitled "Mid-winter solstice"

"Innovation", "passion", "creativity" and "pioneering" created personal computers like the Amiga but the Amiga today is about endless books, emulation and preserving what will soon be forgotten. RIP Amiga.

Status: Offline

pavlor

Re: Trevor's Amiga Blog
Posted on 7-Jun-2024 14:44:02

[ #1548 ]

Elite Member

Joined: 10-Jul-2005
Posts: 9693
From: Unknown

@amigakit

Thanks for posting!

Status: Offline

number6

Re: Trevor's Amiga Blog
Posted on 28-Aug-2024 18:55:24

[ #1549 ]

Elite Member

Joined: 25-Mar-2005
Posts: 11883
From: In the village

@thread

Just adding Trevor's blog posting from August 6, 2024:

Back to winter down-under

#6

_________________
This posting, in its entirety, represents solely the perspective of the author.
*Secrecy has served us so well*

Status: Offline

matthey

Re: Trevor's Amiga Blog
Posted on 29-Aug-2024 19:59:51

[ #1550 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2754
From: Kansas

#6 Quote:

Just adding Trevor's blog posting from August 6, 2024:

Back to winter down-under

Trevor is just as oblivious and delusional as ever. All is well in Amiga Neverland.

http://blog.a-eon.biz/blog/index.php/2024/08/06/back-to-winter-down-under/ Quote:

AmiBenchâ€™s performance is further boosted by AmigaKitâ€™s ARMgraphics.library, which bypasses the 68K graphics bottleneck to accelerate graphics rendering.

There was an Amiga chip memory bottleneck due to cheap CBM chipsets and memory but it had nothing to do with the 68k and it is removed on the A600GS with chip memory speed "a massive 412.98 compared to the chip speed of an A600". Where is there a "68K graphics bottleneck"?

No 68k CPU has "graphics" inside the CPU so logically any bottleneck would be from accessing the external address space of memory and hardware registers but this is where the load/store ARM Cortex-A53 has bottlenecks not 68k CPUs.

1. load-to-use stall bottleneck
2. load/store bottleneck
3. fetch and instruction cache bottleneck due to poor code density

Native ARM code with instruction scheduling can partially remove the huge ARM Cortex-A53 bottleneck due to load-to-use stalls. Simple and quick emulation conversion of 68k to AArch64 code usually does not include instruction scheduling though. Worst case instruction scheduling can be expected as the 68k CPUs do not have load-to-use stalls so the code is not scheduled to avoid them (load results are usually accessed by the next instruction stalling the Cortex-A53). A load-to-use penalty of 3 cycles is a performance killer and bad for an 8-stage CPU especially when 68k CPUs didn't have any load-to-use penalty. Then there is the RISC load/store instruction bottleneck that requires more instructions, memory traffic and registers than CISC.

A typical general purpose CPU workload of instructions will average to approximately the following.

load 26% (~25% which is 5 out of 20 instructions)
store 10% (2 out of 20 instructions)
ALU 49% (~50% which is 10 out of 20 instructions)
branch 15% (3 out of 20 instructions)

20 instruction typical workload for ARM Cortex-A53 emulating 68k code
load x5 with load-to-use stalls (5*4= 20 cycles)
store x2 (2*0.5 to 2*1= 1-2 cycles with store buffer)
ALU x10 (10*0.5 to 10*1= 5-10 cycles)
branch x3 (most are conditional predicted branches so ~0 cycles)
---
total: 26-32 cycles

The Cortex-A53 load instruction execution throughput is 1 cycle but the latency is 3 cycles and the load result can't be used for 3 cycles. Much of the Cortex-A53 performance improvement over the predecessor Cortex-A7 is due to improved superscalar execution but this requires very good and sometimes impossible instruction scheduling due to the increased load-to-use penalty. The Cortex-A53 has two simple integer execution pipelines so it can execute two ALU instructions in a cycle but so could the 68060 which also has most of the design features without the bottlenecks.

20 instruction typical workload for 68060
load+ALU x5 (5*0.5 to 5*1= 2.5-5 cycles)
store x2 (2*0.5 to 2*1= 1-2 cycles with store buffer)
ALU x5 (5*0.5 to 5*1= 2.5-5 cycles)
branch x3 (most are conditional predicted branches so ~0 cycles)
---
total: 6-12 cycles

The in-order superscalar 68060 design not only eliminates bottleneck #1, load-to-use stalls, but CISC load+ALU instructions avoid bottleneck #2 also. The load+ALU instructions are pipelined superscalar executing in a single cycle where the Cortex-A53 requires two single cycle instructions with a 3 cycle load-to-use stall between without instruction scheduling. The 68060 design makes instruction scheduling much easier and the 68060 has amazing performance for an in-order design considering most compilers have no 68060 specific instruction scheduler. The SiFive U74 core architects used a similar 68060 like design for RISC-V with impressive results even though only bottleneck #1 is removed as RISC-V is load/store so suffers from bottleneck #2 and bottleneck #3 (RVC compressed and AArch64 still have inferior code density to the 68060). The simple and tiny in-order SiFive U74 core not only outperforms the in-order Cortex-A53 for integer performance but the OoO PPC G5 in some benchmarks with a fraction of the units and transistors. An equivalent RVC L1I instruction cache may contain twice as much code as a PPC L1I cache so bottleneck #3 is reduced even though a 68k L1I cache may contain twice as much code as RVC and four times the amount of code as PPC. Maybe the better code density was the larger part of the SiFive U74 core advantage because OoO is supposed to remove most of load-to-use stalls and reduce performance loss from poor instruction scheduling. One of the big selling points of the in-order Cortex-A53 was that it offered similar if not better performance compared to the OoO Cortex-A9 with a smaller lower power in-order core.

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/the-top-5-things-to-know-about-cortex-a53 Quote:

3. Higher performance than Cortex-A9: smaller and more efficient too

The Cortex-A9 features an out-of-order pipeline, dual issue capability, and a longer pipeline than Cortex-A53 that enables 15% higher frequency operation. However the Cortex-A53 achieves higher single thread performance by pushing a simpler design farther - some of the key factors enabling the performance of the Cortex-A53 include the integrated low latency level 2 cache, the larger 512 entry main TLB, and the complex branch predictor. The Cortex-A9 has set the bar for the high end of the smartphone market through 2012 â€“ by matching and exceeding that level of performance in a smaller footprint and power budget, the Cortex-A53 delivers performance to entry level devices that was previously enjoyed by high-end flagship mobile devices â€“ in a lower power budget and at lower cost. The graph below compares the single thread performance of the high efficiency Cortex-A processors with the Cortex-A9. At the same frequency, Cortex-A53 delivers more than 20% higher instruction throughput than the Cortex-A9 for representative workloads.

The ARM doc above also verifies that a longer pipeline enables higher frequency but it can result in larger performance killing load-to-use penalties with RISC designs. The early RISC shallow pipeline performance advantage quickly disappeared as longer pipeline less microcoded CISC designs appeared that avoided classic RISC bottlenecks #1-3. The Cortex-A53 suffers from all RISC bottlenecks with bottleneck #2 worse than the classic RISC pipeline by a cycle while bottleneck #3, code density, is somewhat better than most classic RISC ISAs. Even the classic RISC pipeline 2 cycle load-to-use penalty was researched with simulations in an old paper showing zero cycle loads (zero cycle load-to-use penalty) provided more integer performance than 32 GP registers vs 8 GP registers and an in-order CPU design with zero cycle loads was surprisingly close to an aggressive OoO design.

https://ftp.cs.wisc.edu/sohi/papers/1995/micro.zcl.pdf Quote:

The column labeled Cycle(In+ZCL)=Cycle(Out) repeats the experiments, except the in-order issue processor has support for zero-cycle loads. For the integer codes, the performance of the two processors is now much closer - both out performing each other in some cases, with slightly better performance on the out-of-order issue processor.

This result is striking when one considers the clock cycle and design time advantages typically afforded to in-order issue processors. It may be the case that for workloads where untolerated latency is dominated by data cache access latencies (as in the case of the integer benchmarks), an in-order issue design with support for zero-cycle loads may consistently out perform an out-of-order issue processor.

In-order CPU designs are at a disadvantage due to long load latencies stalling the CPU waiting on data or instructions not in the L1 cache but RISC load-to-use stalls, load/store instructions and poor code density increase the bottlenecks while CISC designs improve them. Smaller, lower power and cheaper in-order core designs are compelling which is a major reason why the Cortex-A53 was/is likely the most popular ARM core ever and why it was naively chosen for THEA500 Mini and A600GS despite the 3 cycle load-to-use stall making it far from ideal for emulation.

https://www.anandtech.com/show/6420/arms-cortex-a57-and-cortex-a53-the-first-64bit-armv8-cpu-cores Quote:

ARM claims that on the same process node (32nm) the Cortex A53 is able to deliver the same performance as a Cortex A9 but at roughly 60% of the die area. The performance claims apply to both integer and floating point workloads. ARM tells me that it simply reduced a lot of the buffering and data structure size, while more efficiently improving performance. From looking at Apple's Swift it's very obvious that a lot can be done simply by improving the memory interface of ARM's Cortex A9. It's possible that ARM addressed that shortcoming while balancing out the gains by removing other performance enhancing elements of the core.

I've explained all this before although maybe not all in one place. The memory/cache/load bottlenecks are with RISC CPUs compared with CISC CPUs and added together they have negative synergies and are significant. Existing 68k code is written assuming CISC advantages which can only be fully realized on CISC designs. CISC advantages can provide a cost advantage along with cheaper hardware requirements without emulation and avoiding ARM royalties but mass production is required. Emulation is not competitive and provides more bottlenecks on top of the RISC bottlenecks, especially for cheap in-order CPU designs with much increased cache requirements.