It's missing many details, including turning off preemptive multitasking for the process calling Forbid(), but the main idea is right. It may be better to put the other cores in a low power stopped mode that listens for interrupts (many processors have instructions for this like the 68060 LPSTOP instruction) than the spinlocks if the Permit() IPIs can wake them up.
If you would have a look at AROS code, e.g. here, at kernel_ipi.h, you would notice we are already working on that.
Yes, such Forbid()/Permit() could be done it it would be massive performance loos since you would replace one byte increment (TDNestCnt) by a byte increment followed by full stop on all cores (the Forbidding core sends IPI to all other CPUS, then each CPU has to confirm that it is entering Forbid/Stop state, the forbidding core has to make absolutely all other cores stopped, i.e. wait for all cores to send the IPI back, before it returns to the caller). Permit would be less restricted since there would be no need to wait/send confirmation IPIs.
The problem is, even if you try to replace as much forbid/permit pairs with semaphores, there still will be a plenty of this nasty calls.