Amigaworld.net - The Amiga Computer Community Portal Website

home

features

news

forums

classifieds

faqs

links

search

6220 members

Amiga Q&A / Free for All / Emulation / Gaming / (Latest Posts)

Login

Lost Password?

Don't have an account yet?
Register now!

Support Amigaworld.net

Your support is needed and is appreciated as Amigaworld.net is primarily dependent upon the support of its users.

Menu

Main sections

»	Home
»	Features
»	News
»	Forums
»	Classifieds
»	Links
»	Downloads

Extras

»	OS4 Zone
»	IRC Network
»	AmigaWorld Radio
»	Newsfeed
»	Top Members
»	Amiga Dealers

Information

»	About Us
»	FAQs
»	Advertise
»	Polls
»	Terms of Service
»	Search

IRC Channel

Server: irc.amigaworld.net
Ports: 1024,5555, 6665-6669
SSL port: 6697
Channel: #Amigaworld
Channel Policy and Guidelines

Who's Online

22 crawler(s) on-line.

95 guest(s) on-line.

1 member(s) on-line.

agami

You are an anonymous user.
Register Now!

agami: 1 min ago

matthey: 46 mins ago

MEGA_RJ_MICAL: 1 hr 52 mins ago

Rob: 1 hr 57 mins ago

AmigaMac: 2 hrs 19 mins ago

OneTimer1: 2 hrs 36 mins ago

ruben: 3 hrs 5 mins ago

Marcian: 3 hrs 6 mins ago

Dragster: 3 hrs 31 mins ago

nbache: 3 hrs 52 mins ago

Forum Index

Amiga Development

Packed Versus Planar: FIGHT

Poster

Thread

MEGA_RJ_MICAL

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 4:55:32

[ #101 ]

Super Member

Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

ZORRAM

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

Status: Offline

Hypex

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 6:51:50

[ #102 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@Trekiej

Quote:
I guess I still trying to understand the versatility of the planar arrangement.

I suppose one versatility, if it could be that, is transparent expansion and layer level access. That is, all things being equal, you start off with one plane, as a 1 bpp bitmap. If you want more colour you simply add another plane and double the amount with the pixel bits still being in the same exact place. You just need to set or clear the same bits for each plane.

You can optimise your planes to suit the palette and organise them around your colours. So some images don't need all planes, they use a subset of planes, so you only need to access those planes for that image. We see fields for plane picking in image structures where there is a plane-map for used planes. So an image of so many colours can be blitted or written into a bimtap with more colours but doesn't need all planes written, just the ones it uses. That's probably the most benefit the user interface is built on these image saving measures.

On top of this, no pun intended, would be layer level access. That is treating each plane as a separate graphic layer. I don't know how useful this would be against any limitations. Each layer could only be one colour per plane since each plane is only 1 bpp. The palette would likely need ordering to suit it. OF course, like other layer tricks, it tends to reduce colours. An 8 bit screen of 8 plane layers limited to 8 colours max doesn't sound very useful. So using a full palette may provide some interesting effects. Being of practical use is another matter.

Quote:
Chunking looks still straight forward to me and I am thing of a more modern machine that has more hardware to use.

Chunking. Lol. Well it is straight forward since the pixels are in direct chunks or bytes in the 8 bit case.

As a comparison here is an 8 bit chunky to planar routine written in C. So p7 to p0 are 8 packed pixels in "bit" order. That is, p7 at leftmost, p0 at rightmost. For simplicity plane is array plane bytes and shows what each plane has. 7 is top pixel plane bits and 0 is low pixel plane bits.


plane[7] = (p7 >> 0 & 128) | (p6 >> 1 & 64) | (p5 >> 2 & 32) | (p4 >> 3 & 16) | (p3 >> 4 & 8) | (p2 >> 5 & 4) | (p1 >> 6 & 2) | (p0 >> 7 & 1);
plane[6] = (p7 << 1 & 128) | (p6 >> 0 & 64) | (p5 >> 1 & 32) | (p4 >> 2 & 16) | (p3 >> 3 & 8) | (p2 >> 4 & 4) | (p1 >> 5 & 2) | (p0 >> 6 & 1);
plane[5] = (p7 << 2 & 128) | (p6 << 1 & 64) | (p5 >> 0 & 32) | (p4 >> 1 & 16) | (p3 >> 2 & 8) | (p2 >> 3 & 4) | (p1 >> 4 & 2) | (p0 >> 5 & 1);
plane[4] = (p7 << 3 & 128) | (p6 << 2 & 64) | (p5 << 1 & 32) | (p4 >> 0 & 16) | (p3 >> 1 & 8) | (p2 >> 2 & 4) | (p1 >> 3 & 2) | (p0 >> 4 & 1);
plane[3] = (p7 << 4 & 128) | (p6 << 3 & 64) | (p5 << 2 & 32) | (p4 << 1 & 16) | (p3 >> 0 & 8) | (p2 >> 1 & 4) | (p1 >> 2 & 2) | (p0 >> 3 & 1);
plane[2] = (p7 << 5 & 128) | (p6 << 4 & 64) | (p5 << 3 & 32) | (p4 << 2 & 16) | (p3 << 1 & 8) | (p2 >> 0 & 4) | (p1 >> 1 & 2) | (p0 >> 2 & 1);
plane[1] = (p7 << 6 & 128) | (p6 << 5 & 64) | (p5 << 4 & 32) | (p4 << 3 & 16) | (p3 << 2 & 8) | (p2 << 1 & 4) | (p1 >> 0 & 2) | (p0 >> 1 & 1);
plane[0] = (p7 << 7 & 128) | (p6 << 6 & 64) | (p5 << 5 & 32) | (p4 << 4 & 16) | (p3 << 3 & 8) | (p2 << 2 & 4) | (p1 << 1 & 2) | (p0 >> 0 & 1);

Last edited by Hypex on 06-Aug-2022 at 06:52 AM.

Status: Offline

Hypex

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 7:49:55

[ #103 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@matthey

Quote:
The Amiga chipset legacy blitter does not work with chunky and more CPU performance, memory and memory bandwidth is needed for 16 bit chunky in AA+ but that could have been easily remedied by CBM using higher performance CPUs provided they could change their vision (Lew Eggebrecht VP of Engineering at CBM who was developing AA+ seemed to have the necessary understanding but too many mistakes were made before and above him). Integrating the CPU and GPU has the potential to improve performance, power efficiency and cost as can be seen by the move back to it with SoCs today.

Actually, I don't think the blitter would need much adapting for chunky at all. So at the sides you have a mask as always but this can be set to mask off nibbles or bytes as required. For a whole image bitmask, it need not be a pure bit mask it needs to scale across to match packed data, it can simply be a bitmask organised as packed mask. Then packed blits are simply straight bits as always, but to function, the blitter needs to handle extra width. So would need the ability to process up to 8 times the width across for 8 bit packed. Blitting from planar to packed or vice versa is more complicated and I see no immediate need for it. Lines could still work since the line can be stretched across with the packed data as the texture.

Quote:
Chunky is most efficient when pixel accesses are always naturally aligned as misaligned pixels can be 2 memory accesses and requires extra shifting logic in hardware.

It's been discussed that odd sizes are technically feasible.

Quote:
Instead of a chunky Amiga only offering 2, 4 or 16 color graphic modes, the planar Amiga is more colorful and flexible/scalable with 2, 4, 8, 16, 32 and HAM6 modes.

It's been discussed here and elsewhere that all those sizes can work with packed as well with no need for alignment. Packed end to end. They can even all fit on one whole line with no end gaps.

Now, I naturally assumed that only powers of two would be used, as in the examples you provide. And any other lesser sizes would need aligning to the next power up. But, it is possible that dynamic width pixels could be supported, the hardware just needs to mask the pixels out.

Had the Amiga supported packed, it's reasonable to assume the same 16 bit width would apply to pixel reads, so a multiple of 16 bits may have restricted the packed bits. However, it actually evens out. So, a 3 bit depth bitmap needs 3x 16-bit words from each plane, to read 16x 3-bit pixels. But, with 3-bit packed, it also needs 3x words, though in a row and they all fit in the same 6 byte space! Do the math and it scales upwards. The pixels all take up the same space regardless of format and all can fit evenly into word multiples of data.

Status: Offline

matthey

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 17:09:28

[ #104 ]

Elite Member

Joined: 14-Mar-2007
Posts: 2747
From: Kansas

Hypex Quote:

That's what you get with a A4000 when you add a graphics card!

In fact there was an official RTG solution. The Commodore A2410. Which could plug into an A2000 and A3000.

The 1991 CBM A2410 graphics card was not for the low end C64 replacement Amiga but for the high end CBM Unix/BSD workstations that just happened to use Amiga hardware. It's obviously not for the Amiga because it has the 2MiB of dual ported VRAM doubling the memory bandwidth which the Amiga custom chips needed and CBM denied to Jay Miner for his Ranger chipset back in 1986-1987. The specs allowed higher resolutions and 256 colors while improving graphics performance. The A2410 beat AGA to market which increased the memory bandwidth in a cheaper but more restrictive way. Even though the A2410 beat the late AGA to market, it was not new technology. The TI TMS34010 was released in 1986 and the TMS34020 with 3D support was released in 1988 and was used for virtual reality on the Amiga with the Rambrandt Amiga extension card from Progressive Peripherals & Software.

https://en.wikipedia.org/wiki/TMS34010

The A2410 used TIGA which was superior to SVGA when it was released in 1989.

https://en.wikipedia.org/wiki/Texas_Instruments_Graphics_Architecture Quote:

Texas Instruments Graphics Architecture (TIGA) is a graphics interface standard created by Texas Instruments that defined the software interface to graphics processors. Using this standard, any software written for TIGA should work correctly on a TIGA-compliant graphics interface card.

The TIGA standard is independent of resolution and color depth which provides a certain degree of future proofing. This standard was designed for high-end graphics.

However, TIGA was not widely adopted. Instead, VESA and Super VGA became the de facto standard for PC graphics devices after the VGA.

...

Despite the superiority of the technology in comparison to typical SuperVGA cards of the era, the relatively high cost and emerging local bus graphics standards meant that IT distributors and PC manufacturers could not see a niche for these products at consumer level.

The (limited) success of the graphics cards paved the way for products based upon various derivatives and clones of IBM's 8514 architecture. Part of the effort to make graphics accelerators useful required TI to convince Microsoft that the internal interfaces to its Windows Operating System had to be adaptable instead of hard-coded. Indeed, all versions of Windows prior to Windows 3.0 were "hard-coded" to specific graphics hardware.

The TIGA standard and TMS340x0 graphics cards paved the way for modern versatile and programmable graphics cards but cheaper standards won the early battles.

Hypex Quote:

However, even though it's common in laptops today to feature dual graphic chipsets, adding VGA to Amiga would have complicated it and added to expense. We have to consider that VGA is a complex graphical device. It may lack sprites and other Amiga features but it is way complicated in other ways. Would have adding that complication been worth it? What did it provide that we needed? VGA offered 6-bit RGB and was also planar based. But pixel planar in VGA modes. Straight linear framebuffer modes had limitations and the "Doom" mode wasn't know about until later. Also, VGA wasn't the only chunky hardware around, other computers like Acorns and Apples featured chunky modes without needing any extra VGA chip.

It is also foreign so I imagine hard to integrate. Adding VGA to an Amiga tends to control the Amiga as Amiga modes get second classed. For a proper Amiga solution I think the Amiga would need to be in control of VGA so the copper could control VGA passthrough and allow screen dragging of RTG modes. No RTG solutions did this AFAIK. RTG screen dragging was emulated through legacy raster interrupts or blitting like on OS4.

If what we wanted from VGA was a straight out chunky mode I think we would have been better if it was just added to the chipset. At the end of the day, it reads data, combines it and feeds it into a DAC. We just needed modes that provided it with a direct CLUT index and could skip any serial/parallel conversion. Being able to read the data sequentially should have optimised the operation even if it needed alignment.

TIGA was more flexible than VGA and RTG solutions were provided for the Amiga in the case of the A2410. TI tried to sell GPU solutions for console use which CBM could have used for the Amiga.

https://en.wikipedia.org/wiki/TMS34010#Game_console Quote:

TI made an unsuccessful effort in 1987 and 1988 to convince games makers such as Nintendo and Sega to write 3D games and create a new console market. In 1987 TI provided the first demonstration of true real-time 3D games with stereo sound effects on a personal computer (PC), using a small TMS34010 adapter card (called "The Flippy"). The Flippy was designed as the basis of a game development system for consoles and as a PC gaming card in its own right.

CBM was more interested in cost reducing the Amiga into a C64 at all costs but when they nearly achieved their goal the technology was practically outdated because they hadn't enhanced it.

Early (S)VGA hardware lacked flexibility and programmability but became more flexible over time as did the Amiga graphics where more resolutions, formats and pixel clocks were added but finally practically fully programmable modes could be created limited more by memory bandwidth than anything. Despite PC graphics hardware being foreign to the Amiga, ThoR managed to get relatively low cost screen dragging with some limitations working on nearly all P96 RTG supported graphics cards some of which use (S)VGA hardware. Monitor switcher and pass through settings and multiple display support including dragging from one display to another provides the Amiga with relatively modern RTG features even on ancient hardware. Fully integrated graphics support provides the best experience and this is where CBM dropped the BoingBall.

Hypex Quote:

Actually, I don't think the blitter would need much adapting for chunky at all. So at the sides you have a mask as always but this can be set to mask off nibbles or bytes as required. For a whole image bitmask, it need not be a pure bit mask it needs to scale across to match packed data, it can simply be a bitmask organised as packed mask. Then packed blits are simply straight bits as always, but to function, the blitter needs to handle extra width. So would need the ability to process up to 8 times the width across for 8 bit packed. Blitting from planar to packed or vice versa is more complicated and I see no immediate need for it. Lines could still work since the line can be stretched across with the packed data as the texture.

Chunky/packed blitters exist and they mostly move, mask and shift data too. If the Amiga blitter could be enhanced to support chunky/packed data, it would be interesting to compare to using the CPU and SIMD/vector unit (Amigas with 68020+ CPUs often switched to CPU blitting). The blitter has longer setup times than using the CPU so a SIMD/vector unit which can saturate the memory bandwidth may have an advantage (this is the choice of the Vampire/Apollo hardware).

The original Amiga could have switched to the more expensive 68020 released in 1984 and used bitfield instructions while removing the blitter from the custom chips to save space. The 68000 had anemic shift performance necessitating the blitter while the 68020 was a big improvement even though the bitfield instructions were somewhat slow (they provide even misaligned mask and shift in a single instruction though). The 68040 improved the bitfield instruction performance while the 68060 improved shift performance up to 2 shifts per cycle but the bitfield instructions were never optimized so manual mask and shift is fastest despite the better code density when using the bitfield instructions.

Hypex Quote:

It's been discussed that odd sizes are technically feasible.

Sure, but were there any chunky/packed blitters which supported misaligned pixels? Doesn't the fact that they were not popular or did not exist tell the story about their practicality?

Last edited by matthey on 06-Aug-2022 at 05:14 PM.

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 20:22:38

[ #105 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Karlos

Quote:

Karlos wrote:
@cdimauro

I don't doubt for a second that packed pixels will be faster for generic drawing use cases. I'm just curious to see them in action.

It (just) requires an implementation, then...

@Hypex

Quote:

Hypex wrote:
@cdimauro

Okay so dragging this discussion into here for some closure.

It'll be closed once my article is published.

Which will be by next week hopefully, but we're still working to restart the tech blog. It's already done, BTW, albeit in Italian (because actually the blog publishes articles in that tongue, but you can use Google to translate it keeping the formatting or deepl.com for a more accurate translation (text-only, unfortunately).

I'll post the link.

However and for this reason I don't continue here (except for some things that I want to address and/or clarify): you'll find my analysis, details, and data on the article. Which is quite big: 40kB of pure text, plus a few images (that's why it took so long to finish it).
Quote:
Quote:
You're further complicating the situation trying to use such hybrid bitplane / packed formats.

The best way would be implementing it as simple as possible. At 1 bit they are equal. I imagine a BPLCON register to handle the case taking pixel width. Either direct or a multiplier. Depth would be only one plane but pixel width would specify to use packed pixels at set bpp.

This isn't needed for an packed-graphics (only) Amiga. Maybe I'll cover this on a new, more specific, article in the near future.

The same applies to other things written here: I'll not quote and reply now, but I'll do it in a more structured way in the new article.
Quote:
Another idea I had was an indirect packed mode. Likely restricted to 8 bits. So, somehow with copper list, OCS/ECS can do 4 pixels a move and AGA 2 pixels a move. I'm sure there's some trick as I read it's half that in actual speed. So, I was thinking, if the copper was disabled and instead used as a direct index. Depending on bandwidth the bitplanes would or would not also be disabled. So the copper could setup the screen and sprites but once the pointers are set another copper move disables the copper. When the next line is rastered, instead of DMA reading in copper codes, it reads in a direct CLUT index so it would be restricted to 8 bits. Sounds more simple in my head than that explanation.

An example would be beneficial.
Quote:
Quote:
Indeed. If you understand how the display controller and Blitter work with the regular bitplanes, and apply the same operations with packed graphics, you'll see that there's absolutely no difference whatever is the pixel size: you always need shift & masking when there are scrolling playfields to be displayed, or operations like the cookie-cut.

But, regarding the blitter, I see it as quite easily being able to blit packed data. Provided it has the width for it. Even arbitrary pixel sizes are no problem. Just mask off the edges.

Exactly.
Quote:
Quote:
I think that math is enough to prove that packed graphic is almost always better than planar.

If you reduce the pixels to generic packed data in arbitrary sizes of course the math will always be in favour of packed. But this wasn't in dispute. What was is if the data size needs restrictions in the display controller since it is pixel data.

The display controller was the first thing addressed on my article. Just wait and you'll have all answers.

BTW, for planar graphics I've covered the 3 most famous formats: Atari ST-like (per words interleaved), interleaved (per row), and "Amiga" (completely free: a pointer for each bitplane).
Quote:
Quote:
Correct. And if you understand how the display controller works when it has to render a scan line, pixel by pixel, then you might finally get why its implementation would have been MUCH easier with packed graphics (sprites included).

I can see how easier it would be by only reading one block of chip ram. Rather than needing multiple blocks. And the parallel to serial conversion.

Indeed. As long as you start thinking how some functionality works (display controller, sprites, get/set pixel, lines, rectangles, etc.) you understand the benefits and pitfall of packed and planar graphics, and then get the proper conclusions.
Quote:
Quote:
Well, you said exactly the opposite before...

Looking back at what I couldn't imagine, using that exact phrase, was the side effect of splitting the scroll offsets on Amiga with a packed hardware. Because it was a side effect of the Amiga hardware set up. It's not clear cut because, apart from being unable to find any code demonstrating it and yes I should have just done it, it was a side effect of splitting even and odd plane offsets in *single* playfield mode. By your response it would need to be duplicated with a *dual* playfield mode

No duplication, because you don't have even or odd planes with packed graphics: you just have pointers to plane's data, and those planes & data are completely independent from each other.

This will be covered on the new article.
Quote:
What else I couldn't see was related to practicability of packed sizes.

Covered on the article.
Quote:
Quote:
Everything was already rebutted. And examples were also given which clearly shown that bitplanes have no points on the common graphic primitives..

Are you throwing in a red herring? That would be a logical fallacy.

My dispute was practicability of arbitrary bit sizes of pixels inside the display controller.

Same here.
Quote:
Quote:
You don't see it because you're not able to distinguish the display controller SETUP (I've highlighted it) for one or two playfields to be shown (which is absolutely the same, besides the differences with the one or two pointers to the graphic) with the RUNTIME operation of such controller..

So as a means to an end it needs two playfields to duplicate a one playfield trick?

No. I mean that the Amiga requires to separate the bitplane pointers in even and odd to save / retrieve the data for each of the two different playfields. So, bitplane pointers #0, #2, #4 (and #6 on AGA) define the data for playfield #1 and #1, #3, #5 (and #7 on AGA) for playfield #2.

Whereas with packed data you just have bitplane pointer #0 which points to the data of playfield #1 and bitplane pointer #1 to the data of playfield #2.

Everything else is exactly the same.

And, as you can already see, packed graphics is much less complex (and more efficient, looking at the numbers).
Quote:
Quote:
I think that you've no clue on how the Amiga hardware was programmed AND how effectively it worked color clock by color clock. This could explain your problems.

My problems are actually knowing what the trick did, what it was called and where to find some damned example code for it.

Besides the code, the article covers everything.

If you need code (I don't: math is enough, as I've already said), you can start writing it.

@Karlos

Quote:

Karlos wrote:
I still believe that N-bit packed pixels will be essentially no more complex than a single 1-bit bitplane: you'll still need to start somewhere at a machine aligned boundary and use shifts and masks to isolate pixels. The only complication for 3,5,6 and 7 bit packed pixels will be where the mask for a pixel spans a machine word boundary. This problem doesn't exist for 1,2,4 and 8 bit depths.

It doesn't exists for 3, 5, 6 and 7 bit packed pixels as well if you start thinking how to implement the horizontal scrolling (in general).

Then you'll see that, from this perspective, there's absolutely no difference, whatever is the screen depth.

IMO this is the key point to let people go beyond the powers of two and finally have the correct mindset to understand how the things really go.

@Hypex

Quote:

Hypex wrote:
@cdimauro

Quote:
See above: alignment is also very important, even with widths which are multiples of the data bus size.

A case like 121 in width wouldn't be a screen width and would be an image width so can be more flexible.

Explained in the article, and is tightly bounded to the above mentioned horizontal scrolling (think about it, and you'll see why "odd widths" aren't really odd).
Quote:
Planar is more suitable as a format at higher bit depths. And when the pixel width isn't a multiple of 8. At 8 bit depth planar has outdone direct pixels!

8 bit depth is certainly helping a lot in most cases, however it has still many points in common with the "odd" (and even... "even") depths.

Don't believe it? Again, think about the display controller implementing horizontal scrolling...

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 6-Aug-2022 21:20:03

[ #106 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@bhabbott

Quote:

bhabbott wrote:

But how much better might chunky be for 2D games and other applications? To answer that question we need code!

Code is not needed: math (and analysis) should be enough to prove it.

However implementing it might be an interesting exercise for people for which code is a requirement for understanding the challenge.

@Hammer

Quote:

Hammer wrote:

Apollo-core's AC68080 attempted to be "AMD" for the 68K family and it was distracted from accelerating the 68K mission. Apollo-core attempted to fork with their own AmigaOS clone via customized AROS, AMMX extensions and compromised FPU. Another Genesi (with Phase 5/DCE/bPlan) wannabe.

Actually Apollo's 68080 is attempted to be more like Intel instead of AMD, due to the design decisions.
Quote:
AMD didn't compromise the X87 FPU instruction set.

It did it, with 3DNow!
Quote:
PiStorm+Pi3a (or Pi Zero 2W) method followed AMD K5 (with Am29K RISC-like core)/Transmeta Code Morph/NVIDIA Project Denver below the OS level methods. This method enables unfriendly 68K Amiga software to be transparently CPU accelerated.

Hum. It depends on the which 68K's features the Amiga software is using.

Emulating a 68K is easy and fast / efficient when you only focus on user-mode instructions and nothing else, but could be quite complex and super slow if something else is used (e.g.: some PMMU features, just to give an example).
Quote:
For Doom and Quake-type games with sufficient CPU power, Commodore's 16-bit/32-bit AGA with 32-bit Chip RAM is superior when compared to IBM's original VGA.

Easy win: VGA arrived on 1987...
Quote:
AGA is sufficient for PC's Doom and Quake era games when coupled with sufficient CPU power. AGA is full motion video capable.

It's very difficult to achieve it, since you don't just need to just copy data from fast to chip memory, but realistically you have to do some other tasks / computations.

@bhabbott

Quote:

bhabbott wrote:
@Trekiej

If you were designing a circuit using standard TTL chips etc. (not HDL) I think bitplanes would be easier. For a 2 color bitmap the circuitry is the same - fetch a word and shift it out 1 bit at a time to create the video. For 4 colors just duplicate the circuit and feed the separate bitstreams into a DAC (possibly through a CLUT) for analog, or straight out the video connector for digital. 8 colors, 16, 32 all are the same, just with more DMA channels and shift registers. You could produce a board with 1 bitplane, then for more colors simply wire more boards in parallel. 3 boards would give you 8 color RGB, the minimum for a 'true' color display.

The chunky method traditionally involved a 1 bitplane screen whose output bits were then combined in pairs or quads to produce more colors but at lower resolution. So if the data is fetched 8 bits at a time you could have 8 pixels per word in 2 colors, or 4 pixels in 4 colors, 2 pixels in 16 colors or 1 pixel in 256 colors. But the resolution would reduce accordingly, from eg. 640 to 320 to 160 to 80 256 color pixels per line. But that didn't work with pixel groupings that didn't pack into a single word, so you couldn't have 8, 32, 64 or 128 colors without wasting bits, and so couldn't take advantage of the potential memory savings.

Actually it's the opposite: packed / chunky graphics doesn't require different "boards" to be wired in parallel to support pixels of any depth. In fact, the display controller just fetches the "words" from memory and then masks (and later shifts the words) according to the depth (what it is).

It's much simpler than planar graphics.
Quote:
On the software side you have a similar issue. Converting between different resolutions and packing pixels into words might not be a problem for modern systems, but what if you were designing the original Amiga today, with all the same constraints? Originally it was to have an 8 bit CPU and 128k RAM. Fortunately that was changed to a 68000 and 256k, but the 16 bit CPU and sophisticated GUI used more RAM. So you would want your fonts and other imagery to take up minimum space in single bitplanes, and be rendered without having to repack them for different color depths. With multitasking in limited RAM you also want to be able to use memory chunks that are not contiguous. Smaller individual bitplanes can be squeezed into spaces that a full chunky bitmap wouldn't fit.

Wrong: packed / chunky can do it as well. You'll see on my article.

And, on the exact contrary, packed is on average more efficient than planar graphics.
Quote:
If the Amiga used packed pixels you can bet that it wouldn't have had a 32 color mode.

You would have lost your bet then...
Quote:
The next step up of 256 colors would be too expensive or too lo-res, so it would be stuck with 16 colors max like the ST, EGA, and numerous other home computers. With only 16 colors it probably wouldn't have been worth having a 4096 color palette either.

Wrong again: 320x200 (NTSC; 320x256 for PAL/SECAM) in 256 colors requires exactly the same bandwidth of 640x200 (256) in 16 colors which was already possible with the OCS, as Karlos already reported.

Plus, Acorn Archimedes shown a more extended version of EHB which didn't required a full 256 colors palette to be used/defined.
Quote:
Then we wouldn't have gotten those groundbreaking titles like Defender of the Crown and Deluxe Paint that showed off those extra colors.

See above: they could have been much better supporting 256 colors (even with a 32 colors CLUT).
Quote:
Plus we wouldn't have dual playfields,

Wrong again: see my previous comment to HyperX.
Quote:
and games and demos wouldn't be able to use as many tricks to wow us.

Wrong again: there's mostly no difference, besides accessing single bitplanes (which taxes a lot packed graphics).
Quote:
So instead of being amazing the Amiga would be meh.

No, it would have been much better on average.
Quote:
In 1982-84 when the Amiga was designed, 256 colors with 1 pixel per byte was considered an extravagance. A single 320x200 screen would have taken up half the RAM, leaving precious little for anything else (forget about double buffering!). Most computers of the time used 16k or less, trading resolution for colors and/or using tiling and attribute colors to save memory and reduce CPU load. The Amiga dropped all of that for the simplicity of multiple bitplanes.

320x200 x 256 colors required a little bit less 64kB. Amiga 1000 was sold with 512KB: enough for several games.

And, BTW, why do you want to put limits on how coders would have used an Amiga 1000 with 256 colors? Let them decide on their own!

@Kronos

Quote:

Kronos wrote:
@kolla

IMO the real question never was planar vs. chunky, that question had been answered when 8Bit became feasible,

Being an Amiga game developer, the question is also planar vs packed / chunky.
Quote:
the real question was shared memory or dedicated VRAM.

As per above, the real question is: what the f*ck Commodore did giving the "slow mem" which is neither chip nor fast ram, but the worse of the above?!?

As a game developer, for the Amiga I've preferred more chip ram instead of slow ram, which would have helped on almost all games. Fast ram isn't that much useful.
Quote:
OCS(ECS/AGA) only made sense when paired with the "always 1 or more waitstate" 68000, once you move to the 68020 it killed CPU performance.
Pretty sure an A1200 with 1MB proper FastRAM and a 0.5 or 1MB VGA with some brains (like the ET4000) would have stomped all over the A1200 we got in all productivity and even most of the hard coded games without costing more.

See above: better to have more (all) chip ram (1MB for OCS/ECS). For games.

@NutsAboutAmiga

Quote:

NutsAboutAmiga wrote:
@Karlos

I guess you might done packed format like this:

0,1,2,3 ref the bits.

0000 0000 – 2 colors, 8 pixels per byte. (Just lilke planar)
1010 1010 – 4 colors, 4 pixels per byte. (just like std MacOS gfx)
3210 3210 – 16 colors, 2 pixels per byte. (just like std MacOS gfx)

then you do this, you take 2 images and put the images on top of etch other,
this gives you dual playfield or foreground background chunky image.

4bit+1bit = 5bit 32 colors, or 16 + 2 colors.
4bit+2bit = 6bit 64 colors, or 16 + 4 colors.
4bit+4bit = 8bit 256 colors, or 16 + 16 colors mode.

almost no masking needed, and two layers, so don’t need to restore background.

Those are only some possible combinations with packed graphics. In fact, you can have any mix, from 1+1 bits up to 7+1 or 1+7 with a total maximum of 8 bits used.

Packed gives you much greater flexibility for dual playfield mode.
Quote:
you get major memory saving, by being able to use 32,64 colors modes instead of 256 color mode. you won’t get the 8 colors or 128 colors mode, but pretty close, to all modes Amiga has. seting 5bit,6bit colors is bit complicated but not as complicated as 5/6 bitplanes modes, and you can do the cool transpareacny palette effects, without a sweet.

You could do the same with packed, with better savings compared to planar.

You'll see on my article.
Quote:
and then you have the normal 8bit / 256 colors chunky mode for 3D.

As pointed out HAM modes work on palette lookup table, and does effect how color index are stored in the image. I keep the Cooper as it can set start position of pixel clock, and can change width of data fetch, and place sprites and so on, like normal.

when display it, will work pretty well with a dual shift register, shift in image with highest number bits first, then sift in bit lowest bits, and or the result, into mixed color index, fetch rate will different for two images, but should not be a issue.

I guess this how chunky based Amiga chipset can be designed in the 80’s, instead of OCS / AGA. and then in alternative past 90’s they might have slapped on 15bit or 16bit, and 32bit true color graphics.

Correct. But see above.

P.S. It's late now. Tomorrow I'll reply to other comments on this thread.

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 7-Aug-2022 6:19:08

[ #107 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@matthey

Quote:

matthey wrote:

The Amiga suffered from a lack of memory bandwidth which applied to the whole SoC as CBM not only used the cheapest and slowest chip memory DRAM but Amigas often lacked fast memory to offload CPU memory bandwidth requirements.

As a game developer, I don't agree: it wasn't fast memory which was required, but chip mem. ONLY chip mem. This would have helped A LOT on the vast majority of cases, since the CPU is often used just to load the chipset registers to start some operations (specifically: setting up the Blitter).
Quote:
The lack of memory bandwidth, especially for the Amiga chipset, was more of a problem than the choice of planar support over chunky graphics support.

The lack of memory bandwidth is ALSO due to the unlucky choice of planar instead of chunky graphics.

My article will show it with number at hands, since it specifically targets and reports how many read and/or write accesses to memory are required for the display controller or for some graphic primitives.
I've only considered the case with 8 colors = 3 bits per pixel to keep the article simple (it's already 40kB of text, as I've said), but the analysis could be easily be redone exactly for any pixel size.
I've also primarily considered a system with an 8 bit data bus, but sometimes I've reported analysis for 16, 32, and 64 bit data bus sizes, to show how the analysis and numbers scale (in favor of the packed graphics, of course: the wider is the data access granularity, the bigger are the benefits of this format).
Quote:
The lack of chipset memory bandwidth was apparent even with AGA which was perceived as being slow even with moderate chipset memory bandwidth improvements still using cheap DRAM. CBM using cheap 68k processors and often no fast memory also added to the handicap while even low AGA memory bandwidth was shown to be adequate to avoid a C2P bottleneck at lower gaming resolutions like 320x200x8. Later FPGA hardware using planar graphics doesn't feel sluggish even at higher resolutions supported by AGA as memory bandwidths are greatly increased. I believe CBM thought planar was adequate up to 8 bit planes as AA+ did not add an 8 bit semi-chunky CLUT mode but did add 16 bit true chunky modes. This not only shows that CBM was capable of adding non-planar Amiga chipset support but also believed 8 bit planar as delivered with AGA was competitive. Breaking the vision and mindset to cost reduce the Amiga 68000+ECS into a C64 died hard only as Amiga 1200 68020+AGA computer sales outperformed Amiga 600 sales which was too late.

See above: games required more bandwidth, and Commodore completely failed to give it since the better memory bandwidth (compared to OCS/ECS) was only reserved to the display controller.

The main problem with AGA is that the Blitter was left EXACTLY the same, so handling only 16-bit at the time, instead of 32 and 64 bit. The ONLY advantage of AGA in this case it's because using screens with 32 or 64-bit memory access freed more memory slots compared to OCS/ECS, so the the Blitter "automatically gained" more memory accesses and bandwidth, but it wasn't that much to sustain certain loads.

However I've to say in its defense that using data bus sizes of 32 and 64-bit could have helped on some workloads, but exacerbated some other cases greatly wasting uses memory and/or memory bandwidth. That's because with planar it happens the exact opposite of packed graphics: the wider is the data bus, the more inefficient it's this format (see above). This is also explained in my article.
Quote:
Chunky is relatively simple and cheap to add to the chipset.

Exactly. It's a bit more complicated only when cookie-cut (e.g.: using a mask for handling graphics overlapping with background) should be implemented, but this can overcome by a proper pipelined implementation (which the Blitter already has implemented, BTW).

@kolla

Quote:

kolla wrote:
On the original Minimig, access to chipram is about 3 times faster than on real ECS Amiga systems. Likewise, AGA on MiST and especially MiSTer, access to chipram is a lot faster than on "real" Amiga, and it does make a difference. What I hoped for with SAGA wasn’t new chinky modes, but rather old planar AGA modes, only faster, scandoubled/flickerfree, and (optionally) with more chipram - the ultimate DeluxePaint workstation :)

Why? Deluxe Paint (and PersonalPaint) already supported packed/chunky modes with RTG. Workbench and applications as well (unless they make some dirty assumptions). And handling graphics this way is much more efficient and faster compared to planar.

@matthey

Quote:

matthey wrote:
Trekiej Quote:

I guess I still trying to understand the versatility of the planar arrangement.

Chunking looks still straight forward to me and I am thing of a more modern machine that has more hardware to use.

Karlos and bhabbot sum up planar vs chunky pretty well in post 83 and 84 of this thread. Chunky is easier for a human programmer but the hardware logic for planar is simple and consistent as the number of colors increases.

That's wrong. As I've said before, this only happens when you've to implement masking on the Blitter. In other cases (display controller, sprites, and implementing graphic primitives in general) packed/chunky is simpler (even much simpler).

Previously you also said that chunky is cheap to implement: why this regression here?
Quote:
The down side is that performance drops due to increased memory accesses as the number of colors increases.

Exactly.
Quote:
Chunky is most efficient when pixel accesses are always naturally aligned as misaligned pixels can be 2 memory accesses and requires extra shifting logic in hardware.

Same here: wrong. Misaligned memory accesses for packed graphics aren't that often, and when this happens many time this worst case scenario is anyway better than the equivalent best case scenario of planar graphics. I've reported it on my article.
Quote:
Chunky graphics with naturally aligned pixels
2^1=2 colors
2^2=4 colors
2^4=16 colors
2^8=256 colors (OCS/ECS bandwidth inadequate)

Planar Amiga graphics
2^1=2 colors
2^2=4 colors
2^3=8 colors
2^4=16 colors
2^5=32 colors
2^6=64 colors (HAM6, EHB)
2^7=128 colors (AGA bandwidth required)
2^8=256 colors (AGA bandwidth required)

Instead of a chunky Amiga only offering 2, 4 or 16 color graphic modes, the planar Amiga is more colorful and flexible/scalable with 2, 4, 8, 16, 32 and HAM6 modes.

Wrong again: I've already explained in other comments, but I've addressed it more specifically on my article.
Quote:
Increasing chip memory bandwidth was very expensive up until about 1987 when VRAM prices dropped allowing for double the memory bandwidth for only about a 20% increase in price and this would have made 256 color chunky possible in low resolution.

Which would have been VERY good, as I've explained yesterday.

@Hypex

Quote:

Hypex wrote:
@NutsAboutAmiga

Quote:
4bit+4bit = 8bit 256 colors, or 16 + 16 colors mode.

almost no masking needed, and two layers, so don’t need to restore background. you get major memory saving, by being able to use 32,64 colors modes instead of 256 color mode. you won’t get the 8 colors or 128 colors mode, but pretty close, to all modes Amiga has. seting 5bit,6bit colors is bit complicated but not as complicated as 5/6 bitplanes modes, and you can do the cool transpareacny palette effects, without a sweet.

I read somewhere they do dual playifield on chunky like this. So two 16 colour fields.

No, you can have much more flexibility with packed graphics: see my previous comment on that specific point.
Quote:
The scrolling can't be independent since it's using one framebuffer.

It can be, if you're computing the final framebuffer taking the data from the two playfields and combining them.
Quote:
But the method was to load in one field using the upper nibbles then logical or on the other field into the lower nibble. For ease of processing the upper field would have all images in top nibble and lower field have all images in lower nibble. Then no masking needed. Unless it was adding sprite objects to a field. Of course the palette then has to be calculated so only max 32 colours appears in 256 colour mode.

Correct: this is an easy and very cheap way to implement a dual playfield screen on a system which only has a 8-bit packed mode.

However you're wasting a lot of colors.
Quote:
Aside from that I would have imagined writing background layer, then overlaying foreground layer and sprites on top. Lots of writing and processing by hand so slightly wasteful. But would have less colour limits.

Indeed, and I think that this was the way that it was implemented in PC games with multiple overlapping playfields.

@Hypex

Quote:

Hypex wrote:
@matthey

Quote:
The Amiga chipset legacy blitter does not work with chunky and more CPU performance, memory and memory bandwidth is needed for 16 bit chunky in AA+ but that could have been easily remedied by CBM using higher performance CPUs provided they could change their vision (Lew Eggebrecht VP of Engineering at CBM who was developing AA+ seemed to have the necessary understanding but too many mistakes were made before and above him). Integrating the CPU and GPU has the potential to improve performance, power efficiency and cost as can be seen by the move back to it with SoCs today.

Actually, I don't think the blitter would need much adapting for chunky at all. So at the sides you have a mask as always but this can be set to mask off nibbles or bytes as required. For a whole image bitmask, it need not be a pure bit mask it needs to scale across to match packed data, it can simply be a bitmask organised as packed mask. Then packed blits are simply straight bits as always, but to function, the blitter needs to handle extra width. So would need the ability to process up to 8 times the width across for 8 bit packed. Blitting from planar to packed or vice versa is more complicated and I see no immediate need for it. Lines could still work since the line can be stretched across with the packed data as the texture.

Quote:
Chunky is most efficient when pixel accesses are always naturally aligned as misaligned pixels can be 2 memory accesses and requires extra shifting logic in hardware.

It's been discussed that odd sizes are technically feasible.

Quote:
Instead of a chunky Amiga only offering 2, 4 or 16 color graphic modes, the planar Amiga is more colorful and flexible/scalable with 2, 4, 8, 16, 32 and HAM6 modes.

It's been discussed here and elsewhere that all those sizes can work with packed as well with no need for alignment. Packed end to end. They can even all fit on one whole line with no end gaps.

Now, I naturally assumed that only powers of two would be used, as in the examples you provide. And any other lesser sizes would need aligning to the next power up. But, it is possible that dynamic width pixels could be supported, the hardware just needs to mask the pixels out.

Had the Amiga supported packed, it's reasonable to assume the same 16 bit width would apply to pixel reads, so a multiple of 16 bits may have restricted the packed bits. However, it actually evens out. So, a 3 bit depth bitmap needs 3x 16-bit words from each plane, to read 16x 3-bit pixels. But, with 3-bit packed, it also needs 3x words, though in a row and they all fit in the same 6 byte space! Do the math and it scales upwards. The pixels all take up the same space regardless of format and all can fit evenly into word multiples of data.

WOW! Finally you did it: got how things go with packed graphics. Then I agree with almost everything.

Only one thing about masking: it can be done exactly like with planar, but implementing it requires a bit more logic (some extra work on pipeline stage to properly prepare the mask according to the source or sources channels' pixel size or sizes).

@matthey

Quote:

matthey wrote:

Early (S)VGA hardware lacked flexibility and programmability but became more flexible over time as did the Amiga graphics where more resolutions, formats and pixel clocks were added but finally practically fully programmable modes could be created limited more by memory bandwidth than anything.

That's incorrect. CGA and MDA already had some flexibility. However EGA introduced A LOT more flexibility and programmability.

VGA was EGA-compatible, so inherited all of them, and "just" added something more (8-bit packed graphics, 18-bit color palette, more bandwidth, more memory).
Quote:
Chunky/packed blitters exist and they mostly move, mask and shift data too. If the Amiga blitter could be enhanced to support chunky/packed data, it would be interesting to compare to using the CPU and SIMD/vector unit (Amigas with 68020+ CPUs often switched to CPU blitting).

SIMD/Vector makes sense only for 8, 16, 24 (more difficult) and 32-bit pixel sizes. It's much more difficult to use such units for less than 8 bits pixel sizes.
Quote:
The blitter has longer setup times than using the CPU so a SIMD/vector unit which can saturate the memory bandwidth may have an advantage (this is the choice of the Vampire/Apollo hardware).

You can remove the "may": packed graphics has ALSO this advantage, compared to planar.
Quote:
The original Amiga could have switched to the more expensive 68020 released in 1984 and used bitfield instructions while removing the blitter from the custom chips to save space. The 68000 had anemic shift performance necessitating the blitter while the 68020 was a big improvement even though the bitfield instructions were somewhat slow (they provide even misaligned mask and shift in a single instruction though). The 68040 improved the bitfield instruction performance while the 68060 improved shift performance up to 2 shifts per cycle but the bitfield instructions were never optimized so manual mask and shift is fastest despite the better code density when using the bitfield instructions.

Indeed. 68060 was too late. Before that the CPU could be used on some scenarios to replace the Blitter, but not in the most complicated ones.
Quote:
Sure, but were there any chunky/packed blitters which supported misaligned pixels? Doesn't the fact that they were not popular or did not exist tell the story about their practicality?

This is a logical fallacy, similar to the Argumentum ad populum.

No, you don't need that something exists to prove if it's good or not.

To me the lack of current implementation of odd pixel sizes is probably due to the lack of proper creativity / thinking out of order: computer scientist and programmers are used to power-of-twos, so they think that "this is the (only) way" (TM).

My article gives some details about the implementation of the display controller (specifically), and some graphic primitives. This could give some inputs to someone which effectively implements it, if people are demanding for something "concrete".

To me math is and should be enough, and I'll give proper analysis and numbers.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 3:49:56

[ #108 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@cdimauro

My reference to odd-numbered* bit packed pixels having complexity where a pixel's bits span a machine word boundary relates to software rendering, e.g. rendering a pixel in software. Assuming you have regular DRAM with no special write-mask features, plotting a 3-bit pixel that spans any byte addressable boundary requires two reads and two writes to set every bit in the target pixel without damaging the neighbouring ones. For such an 8 colour screen, 2 pixels in each 3-byte span of 8 pixels requires this pair of 8-bit read/write accesses. The only way to do it using a single read/write access for these pixels would be to increase the access size to 16-bit, which only works if there's no penalty for misaligned access, which wouldn't be the case when the 16-bit word itself spans whatever maximum width the bus works at.

That's not to say this isn't better than planar. Even the most naive implementation above requires two read/write cycles 25% of the time, whereas an equivalent planar example requires 3 read/write cycles 100% of the time.

* It's not just odd sizes that this applies to, rather any non power of 2 size. You'll always be able to set all required bits of a pixel in a single read/write for 1, 2 and 4 bit pixels. For 8, 16 and 32, it's a pure write only case, of course.

Last edited by Karlos on 08-Aug-2022 at 03:53 AM.

_________________
Doing stupid things for fun...

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 5:07:33

[ #109 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Karlos

Quote:

Karlos wrote:
@cdimauro

My reference to odd-numbered* bit packed pixels having complexity where a pixel's bits span a machine word boundary relates to software rendering, e.g. rendering a pixel in software.

OK, got it, and yes: it's a bit more difficult to be handled in software, because of the misalignment that could occur, which requires special code.
Quote:
Assuming you have regular DRAM with no special write-mask features, plotting a 3-bit pixel that spans any byte addressable boundary requires two reads and two writes to set every bit in the target pixel without damaging the neighbouring ones. For such an 8 colour screen, 2 pixels in each 3-byte span of 8 pixels requires this pair of 8-bit read/write accesses. The only way to do it using a single read/write access for these pixels would be to increase the access size to 16-bit, which only works if there's no penalty for misaligned access, which wouldn't be the case when the 16-bit word itself spans whatever maximum width the bus works at.

Correct. With 3-bit pixels there are always 2 misalignments to be considered, whatever is the data bus size.
Quote:
That's not to say this isn't better than planar. Even the most naive implementation above requires two read/write cycles 25% of the time, whereas an equivalent planar example requires 3 read/write cycles 100% of the time.

Indeed. This is graphic primitive where the worst case scenario for packed is always much better than the best case scenario for planar graphics (only for 2-bit size they match). And where planar is going worse growing the number of colors.
Quote:
* It's not just odd sizes that this applies to, rather any non power of 2 size. You'll always be able to set all required bits of a pixel in a single read/write for 1, 2 and 4 bit pixels. For 8, 16 and 32, it's a pure write only case, of course.

Yes. Powers-of-two gain some benefits in this specific case.

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 5:42:13

[ #110 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6503
From: Australia

@Hypex

Quote:

Hypex wrote:
@Kronos

Quote:

At that point (1992) it would have been much better to focus on an official RTG solution, chip the A4000 with both AGA and a VGA chip and make it clear that whatever would have come after AGA would NOT be backward compatible at HW and lower SW level

That's what you get with a A4000 when you add a graphics card!

In fact there was an official RTG solution. The Commodore Commodore A2410. Which could plug into an A2000 and A3000.

However, even though it's common in laptops today to feature dual graphic chipsets, adding VGA to Amiga would have complicated it and added to expense. We have to consider that VGA is a complex graphical device. It may lack sprites and other Amiga features but it is way complicated in other ways. Would have adding that complication been worth it? What did it provide that we needed? VGA offered 6-bit RGB and was also planar based. But pixel planar in VGA modes. Straight linear framebuffer modes had limitations and the "Doom" mode wasn't know about until later. Also, VGA wasn't the only chunky hardware around, other computers like Acorns and Apples featured chunky modes without needing any extra VGA chip.

It is also foreign so I imagine hard to integrate. Adding VGA to an Amiga tends to control the Amiga as Amiga modes get second classed. For a proper Amiga solution I think the Amiga would need to be in control of VGA so the copper could control VGA passthrough and allow screen dragging of RTG modes. No RTG solutions did this AFAIK. RTG screen dragging was emulated through legacy raster interrupts or blitting like on OS4.

If what we wanted from VGA was a straight out chunky mode I think we would have been better if it was just added to the chipset. At the end of the day, it reads data, combines it and feeds it into a DAC. We just needed modes that provided it with a direct CLUT index and could skip any serial/parallel conversion. Being able to read the data sequentially should have optimised the operation even if it needed alignment.

Commodore A2410 has Texas Instruments TMS34010, this TIGA ASIC is not low-cost when compared to ET4000AX. Commodore A2410 didn't advance Amiga's native chipset and somebody at Commodore is PC centric e.g. Commodore Germany and Bill Sydnes.

In the PC world, the TIGA died. Lower-cost clones of IBM's 8514 architecture such as ET4000 SVGA killed off TIGA.

FYI, Advanced Amiga Architecture (AAA) chipset has direct chunky 16-bit pixels (15 bits for 32768 colors and 1 bit for genlock overlay), provided by custom chip 'Monica', but Commodore wasted engineering resources with C65 as the second 256 color chipset.

C65 (256 display colors with 4096 color palette) was completed in December 1990.
AGA (256 display colors with 16M color palette) was completed in March 1991.

Time and resources were also wasted on Bill Sydnes's "A1000jr" and A600 ECS designs. ECS's limited four color productivity display modes doesn't address Amiga's core market and are inferior to the 31.5 kHz VGA 640x480x 16-color mode, let alone IBM's 8514 clones/SVGA and IBM XGA.

Refer to Vampire SAGA's chunky pixels or Individual Computers Graffiti chunky pixels extensions. Graffiti changes the Amiga-bitplaned graphics into a chunky pixel mode i.e. one byte in memory represents a single pixel, hence the value of a byte selects it's colour.

Modify Lisa chip with integrated Graffiti chunky pixels.

John Carmack's argument against the Amiga has the install base context.

PS; I have an Amiga 3000 (with 68030/68882 at 25 Mhz) in early 1992 and I have my own beef against the fools who designed ECS.

Unlike C= Amiga Hombre, TIGA's 3D didn't target OpenGL compliance i.e. SGI has accelerated 3D leadership. PC OpenGL/Direct3D cloners killed off SGI's 3D hardware.

Last edited by Hammer on 08-Aug-2022 at 07:25 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 6:23:43

[ #111 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6503
From: Australia

@cdimauro

Quote:

Actually Apollo's 68080 is attempted to be more like Intel instead of AMD, due to the design decisions.

Actually, Intel's P5 Pentium X87 is not compromised i.e. 80-bit precision is available. Intel MMX is an extra feature in addition to X87 FPU.

Intel 8087 is designed to support a 32-bit "single-precision", 64-bit "double-precision" , and "80-bit" extended precision formats.

All Intel and AMD X86-64 CPUs still support X87 80-bit precision.

Vampire 1200 V2's FPU features 52-bit precision which doesn't match 68060 FPU's precision.

Apollo-Core is 68K cloner like AMD's X86 cloner position. Don't try to equate Apollo-Core as genuine "Intel" when Apollo-Core is not genuine Motorola 68K!

AMD K7 Athlon has three parallel pipelined FP execution units with three ports while Pentium III's pipelined three FP execution units are all behind one port.

Quote:

It did it, with 3DNow!

AMD K6-III's X87 function is not compromised i.e. 80 bit precision is available. AMD's 3DNow is an extra feature in addition to Intel MMX and X87 FPU support.

AMD's 3DNow Pro includes support for Intel SSE in addition to Intel MMX, X87, and AMD 3DNow support. AMD Bulldozer/Jaguar/Zen has dropped support for 3DNow.

Both Intel (since Bay-Trail) and AMD support 3DNow's PREFETCH and PREFETCHW instructions.

For Quake, AMD K6-III's X87 issue is about performance being inferior to the Pentium II counterpart, not about being X87 compliant. All AMD K5, K6 to K17 Zen are X87 compliant.
Microsoft DirectX6 (Direct3D) supports AMD's 3DNow and Intel SSE as part of its abstraction layer.

All Intel 486DX, P5 Pentium to the latest Alderlake are X87 compliant. Intel recalled the defective Pentium FDIV bug and unstable 1.13Ghz Pentium III.

Quote:

It's very difficult to achieve it, since you don't just need to just copy data from fast to chip memory, but realistically you have to do some other tasks / computations.

https://youtu.be/1B1jKjrRUmk
Doom on C=Amiga's Motorola 68030 @ 50 Mhz with C= AGA vs PC clone's AMD 386 @ 40 Mhz with Tseng Labs ET4000AX ISA. Result: similar performance.

Last edited by Hammer on 08-Aug-2022 at 06:44 AM.
Last edited by Hammer on 08-Aug-2022 at 06:35 AM.
Last edited by Hammer on 08-Aug-2022 at 06:30 AM.
Last edited by Hammer on 08-Aug-2022 at 06:29 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

Hammer

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 6:59:17

[ #112 ]

Elite Member

Joined: 9-Mar-2003
Posts: 6503
From: Australia

@cdimauro

Quote:

SIMD/Vector makes sense only for 8, 16, 24 (more difficult) and 32-bit pixel sizes. It's much more difficult to use such units for less than 8 bits pixel sizes.

Strange... RDNA 2 has INT4 compute support.

https://www.techspot.com/article/2151-nvidia-ampere-vs-amd-rdna2/

It's useful for deep learning...

All mainstream GPUs use an 8-bit stencil buffer, and they are tied to the depth buffer, so you must choose a buffer format that includes stencil i.e. D24S8 primarily.

Last edited by Hammer on 08-Aug-2022 at 07:06 AM.

_________________
Amiga 1200 (rev 1D1, KS 3.2, PiStorm32/RPi CM4/Emu68)
Amiga 500 (rev 6A, ECS, KS 3.2, PiStorm/RPi 4B/Emu68)
Ryzen 9 7950X, DDR5-6000 64 GB RAM, GeForce RTX 4080 16 GB

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 8-Aug-2022 20:08:48

[ #113 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@Hammer

Quote:

Hammer wrote:
@cdimauro

Quote:

Actually Apollo's 68080 is attempted to be more like Intel instead of AMD, due to the design decisions.

Actually, Intel's P5 Pentium X87 is not compromised i.e. 80-bit precision is available.

Not relevant: see below.
Quote:
Intel MMX is an extra feature in addition to X87 FPU.

That's why I've said that the Apollo's 68080 is more like Intel: it was Intel that reused the FPU to introduce its first SIMD unit, the MMX.

Besides that, the 68080 also introduced instructions fusion, which is another thing introduced by Intel (with the Banias micro-architecture).
Quote:
Intel 8087 is designed to support a 32-bit "single-precision", 64-bit "double-precision" , and "80-bit" extended precision formats.

All Intel and AMD X86-64 CPUs still support X87 80-bit precision.

Not relevant: see above.
Quote:
Vampire 1200 V2's FPU features 52-bit precision which doesn't match 68060 FPU's precision.

Not relevant.
Quote:
Apollo-Core is 68K cloner like AMD's X86 cloner position.

I don't agree, for the above reasons.
Quote:
Don't try to equate Apollo-Core as genuine "Intel" when Apollo-Core is not genuine Motorola 68K!

Never said it: see above my clarification.
Quote:
AMD K7 Athlon has three parallel pipelined FP execution units with three ports while Pentium III's pipelined three FP execution units are all behind one port.

Redundant / useless.
Quote:
Quote:

It did it, with 3DNow!

AMD K6-III's X87 function is not compromised i.e. 80 bit precision is available.

Not relevant.
Quote:
AMD's 3DNow is an extra feature in addition to Intel MMX and X87 FPU support.

That's why I've reported it.
Quote:
AMD's 3DNow Pro includes support for Intel SSE in addition to Intel MMX, X87, and AMD 3DNow support. AMD Bulldozer/Jaguar/Zen has dropped support for 3DNow.

Redundant / useless.
Quote:
Both Intel (since Bay-Trail) and AMD support 3DNow's PREFETCH and PREFETCHW instructions.

Redundant / useless.
Quote:
For Quake, AMD K6-III's X87 issue is about performance being inferior to the Pentium II counterpart, not about being X87 compliant. All AMD K5, K6 to K17 Zen are X87 compliant.
Microsoft DirectX6 (Direct3D) supports AMD's 3DNow and Intel SSE as part of its abstraction layer.

Redundant / useless.
Quote:
All Intel 486DX, P5 Pentium to the latest Alderlake are X87 compliant. Intel recalled the defective Pentium FDIV bug and unstable 1.13Ghz Pentium III.

Redundant / useless.
Quote:
Quote:
It's very difficult to achieve it, since you don't just need to just copy data from fast to chip memory, but realistically you have to do some other tasks / computations.

https://youtu.be/1B1jKjrRUmk
Doom on C=Amiga's Motorola 68030 @ 50 Mhz with C= AGA vs PC clone's AMD 386 @ 40 Mhz with Tseng Labs ET4000AX ISA. Result: similar performance.

Performances aren't similar: you've to better and carefully take a look at the video, and you'll see that the Amiga has less FPS generated compared to the PC.

It would have been good to run Doom's timedemo to see the effective framerate of both games (with the Amiga port which introduced some optimizations, BTW).

P.S. I saw that you had the good sense to edit your comment and remove your previous statement (that the Amiga was able to do... full motion! LOL ).
Quote:

Hammer wrote:
@cdimauro

Quote:

SIMD/Vector makes sense only for 8, 16, 24 (more difficult) and 32-bit pixel sizes. It's much more difficult to use such units for less than 8 bits pixel sizes.

Strange... RDNA 2 has INT4 compute support.

https://www.techspot.com/article/2151-nvidia-ampere-vs-amd-rdna2/
[image]
It's useful for deep learning...

Guess what: the thread was about packed vs planar graphics, and not about ML/DL...

Specifically, I was talking about supporting pixel sizes of 2..7 bits in the context of a SIMD/Vector unit.

You can support them, of course, but you have to introduce proper, specific, instructions for all those cases, because they are "odd" (here it makes sense to talk about oddities). And you know it: this complicates the SIMD units (especially if you want to introduce full support for such sizes. Which means: make them as "first citizens", like the regular sizes: 8, 16, 32, 64, bits).

Or, you've to resort to multiple SIMD instructions to extract the proper colors and/or to insert them. Which means: slower execution and waste of SIMD registers space (since you've to unpack the colors, if you want to manipulate them).

Whatever is the implementation, there are always cons. And I repeat again: in the context of a SIMD/vector unit.
Quote:
All mainstream GPUs use an 8-bit stencil buffer, and they are tied to the depth buffer, so you must choose a buffer format that includes stencil i.e. D24S8 primarily.

Redundant / useless.

I don't understand why you fill your comments with not useful information. Is your way to emulate Mega RJ with his padding?

Status: Offline

bhabbott

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 2:48:14

[ #114 ]

Cult Member

Joined: 6-Jun-2018
Posts: 554
From: Aotearoa

@cdimauro

Quote:

cdimauro wrote:
@bhabbott

Quote:

bhabbott wrote:

But how much better might chunky be for 2D games and other applications? To answer that question we need code!

Code is not needed: math (and analysis) should be enough to prove it.

"The proof of the pudding is in the eating."

Math alone will not be enough unless it includes the overhead of a typical implementation - including the code. And I want real proof, ie. a working example that can be tested on a real machine.

Seems I will have to do it myself though, because this thread has devolved into yet another session of Amiga bashing from 'fans' and the delusional fantasies of armchair chip designers.

Status: Offline

cdimauro

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 5:05:30

[ #115 ]

Elite Member

Joined: 29-Oct-2012
Posts: 4432
From: Germany

@bhabbott

Quote:

bhabbott wrote:
@cdimauro

Quote:

cdimauro wrote:

Code is not needed: math (and analysis) should be enough to prove it.

"The proof of the pudding is in the eating."

Math alone will not be enough unless it includes the overhead of a typical implementation - including the code.

With this "logic" to prove that black holes exist you've to touch them. Welcome!
Quote:
And I want real proof, ie. a working example that can be tested on a real machine.

I reveal you a secret: there's no real machine which implements packed graphics mimicking an Amiga...
Quote:
Seems I will have to do it myself though, because this thread has devolved into yet another session of Amiga bashing from 'fans' and the delusional fantasies of armchair chip designers.

Repeating the same mantra every time that you're disappointed because someone dared (!) to criticize your beloved Amiga only further proves the blind fanatic that you're.

BTW, someone already posted some code in the thread, that you clearly ignored because... it is NOT convenient (too bad for Amiga). This to shows your intellectual honesty...

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 10:38:45

[ #116 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@thread

This isn't about bashing the Amiga it's a thought experiment around alternative implementations of one of the Amigas more unique features (allowing 1-8 bits per pixel colour modes).

The Amiga has had access to 8 bit packed pixels (and deeper colour depths) via expansion cards, for decades.

_________________
Doing stupid things for fun...

Status: Offline

Hypex

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 14:24:59

[ #117 ]

Elite Member

Joined: 6-May-2007
Posts: 11351
From: Greensborough, Australia

@matthey

Quote:
The 1991 CBM A2410 graphics card was not for the low end C64 replacement Amiga but for the high end CBM Unix/BSD workstations that just happened to use Amiga hardware. It's obviously not for the Amiga because it has the 2MiB of dual ported VRAM doubling the memory bandwidth which the Amiga custom chips needed and CBM denied to Jay Miner for his Ranger chipset back in 1986-1987. The specs allowed higher resolutions and 256 colors while improving graphics performance. The A2410 beat AGA to market which increased the memory bandwidth in a cheaper but more restrictive way. Even though the A2410 beat the late AGA to market, it was not new technology. The TI TMS34010 was released in 1986 and the TMS34020 with 3D support was released in 1988 and was used for virtual reality on the Amiga with the Rambrandt Amiga extension card from Progressive Peripherals & Software.

It wasn't for low end but my reply was for a higher end model Amiga. Still, it has some interesting features. It was created around the same time as the Amiga was and in some respects looks superior with what it offers. Makes me wonder if the Amiga would have turned out better using these chips than the effort custom designing what it had?

Quote:
CBM was more interested in cost reducing the Amiga into a C64 at all costs but when they nearly achieved their goal the technology was practically outdated because they hadn't enhanced it.

The Amiga was no C64. Apart from an Amiga not being a real Commodore in the C64 sense it too expensive. At the time of the Amiga they had just realised the hard work of the C128 that tried to be a C64. The Amiga didn't make sense compared to that.

Quote:
Early (S)VGA hardware lacked flexibility and programmability but became more flexible over time as did the Amiga graphics where more resolutions, formats and pixel clocks were added but finally practically fully programmable modes could be created limited more by memory bandwidth than anything. Despite PC graphics hardware being foreign to the Amiga, ThoR managed to get relatively low cost screen dragging with some limitations working on nearly all P96 RTG supported graphics cards some of which use (S)VGA hardware. Monitor switcher and pass through settings and multiple display support including dragging from one display to another provides the Amiga with relatively modern RTG features even on ancient hardware. Fully integrated graphics support provides the best experience and this is where CBM dropped the BoingBall. Hypex Quote:

I've seen examples of this and it does work. But, without a copper, would be less efficient I imagine. Also, VGA could do raster interrupts, but that's a rather primitive method. So can a C16 and not many thought that was special doing screen splits with BASIC commands. OS4 builds on the P96 screen dragging but it would be blitted somehow. I suppose I don't think anything can compare to live copper effects and rendering to a back buffer or screen just isn't the same.

Quote:
Chunky/packed blitters exist and they mostly move, mask and shift data too. If the Amiga blitter could be enhanced to support chunky/packed data, it would be interesting to compare to using the CPU and SIMD/vector unit (Amigas with 68020+ CPUs often switched to CPU blitting). The blitter has longer setup times than using the CPU so a SIMD/vector unit which can saturate the memory bandwidth may have an advantage (this is the choice of the Vampire/Apollo hardware).

I read they did this on Apollo. Of course, the still have to support the blitter in hardware, but did they increase the speed? Even so, using a CPU for blit operations, still looks backwards to me. I mean, I would compare it with software 3d like Doom against hardware 3d. I just like dedicated hardware.

Also, there were plans for a blitter per plane. Parallel sounds good but did it include speeding up? I tend to think one operation that blit all planes would be useful. Suppose parallel would have covered that.

Quote:
The original Amiga could have switched to the more expensive 68020 released in 1984 and used bitfield instructions while removing the blitter from the custom chips to save space. The 68000 had anemic shift performance necessitating the blitter while the 68020 was a big improvement even though the bitfield instructions were somewhat slow (they provide even misaligned mask and shift in a single instruction though). The 68040 improved the bitfield instruction performance while the 68060 improved shift performance up to 2 shifts per cycle but the bitfield instructions were never optimized so manual mask and shift is fastest despite the better code density when using the bitfield instructions.

Yes, the problem is, including Commodore producing it, the 68020 was more expensive. I still like the idea of the blitter, though it's more suited to large block transfers. It also does does line drawing. But I don't know if bitfields would have been speedy enough. I was doing some bitfield testing on my A4000/060 recently and found them useful but slow. I ended up changing back to loading and shifting which worked faster. Still, bitfields are said to be how Virtual GP could do fast texture mapping with planar graphics, so perhaps I did something wrong.

Quote:
Sure, but were there any chunky/packed blitters which supported misaligned pixels? Doesn't the fact that they were not popular or did not exist tell the story about their practicality?

Not that I know of but I was referring to display controller first and blitter secondary. In any case misaligned pixels should post no problem. Just mask out any edges and fill it all in between. If the source data is off alignment just shift it like in planar. The only real problem I see would be if the source is a different pixel width. In which case it would need to scale the pixels and spend time packing then in which is 3d territory. As it happens planar has no problem with different depths since you just blit the planes you need.

I tend to think, even if possible, that the benefit would not outweigh any practical advantage. If the extra logic could be used to support 8 bit colour, and it was more simple to do so, then I think it would be better and more practical than supporting some obscure widths. But, I'm not a chip designer, so the logic may be easier than I imagine it to be.

Last edited by Hypex on 09-Aug-2022 at 02:34 PM.

Status: Offline

Karlos

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 18:51:10

[ #118 ]

Elite Member

Joined: 24-Aug-2003
Posts: 4958
From: As-sassin-aaate! As-sassin-aaate! Ooh! We forgot the ammunition!

@Hypex

Quote:
I mean, I would compare it with software 3d like Doom against hardware 3d. I just like dedicated hardware

Are you seriously telling me you didn't use FBlit patches on your Amiga? As soon as you got to a fast 020 (e.g. 28MHz/fast) the game was up for the AGA blitter when doing general workbench stuff. On faster processors (030/50MHz and above) you'd be mad not to use it. The performance difference is night and day.

_________________
Doing stupid things for fun...

Status: Offline

MEGA_RJ_MICAL

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 23:16:31

[ #119 ]

Super Member

Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

PLANAR PADDING

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

Status: Offline

MEGA_RJ_MICAL

Re: Packed Versus Planar: FIGHT
Posted on 9-Aug-2022 23:16:47

[ #120 ]

Super Member

Joined: 13-Dec-2019
Posts: 1200
From: AMIGAWORLD.NET WAS ORIGINALLY FOUNDED BY DAVID DOYLE

PACKED PADDING

_________________
I HAVE ABS OF STEEL
--
CAN YOU SEE ME? CAN YOU HEAR ME? OK FOR WORK

Status: Offline

[ home ][ about us ][ privacy ] [ forums ][ classifieds ] [ links ][ news archive ] [ link to us ][ user account ]

Amigaworld.net was originally founded by David Doyle