To me it sounds like sparse matrix multiplication repackaged as "event-driven spiking computation", where the spikes are simply the non-zero elements that sparse GPU kernels have always been designed to process.
The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.
Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...
The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer.
In particular psychology and related subjects,
since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.
There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.
I believe the argument is that you can also encode information in the time domain.
If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.
Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...
On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.
I think the main benefit of a neuromorphic design would be to make it dataflow driven (asynchronous event driven - don't update neuron outputs unless their inputs change) rather than synchronous, which is the big power efficiency unlock. This doesn't need to imply a spiking design though - that seems more of an implementation detail, at least as far as dataflow goes. Nature seems to use spike firing rates to encode activation strength.
In the brain the relative timing/ordering of different neurons asynchronously activating (A before B, or B before A) is also used (spike-timing-dependent plasticity - STDP) as a learning signal to strengthen or weaken connection strengths, presumably to learn sequence prediction in this asynchronous environment.
STDP also doesn't imply that spikes or single neuron spike train inter-spike timings are necessary - an activation event with a strength and timestamp would
seem to be enough to implement a digital dataflow design, although ultimately a custom analog design may be more efficient.
>The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.
Isn't that in essence very similar to Quantization Aware Training (QaT)?
Can you explain more? Why would that be the case? What is being passed from one layer to the next is not a linear value but the delay until the next spike, which is very different.
But I understand that they simulate the spikes as integer events in the forward pass (as described here https://github.com/BICLab/Int2Spike) and calculate a continuous gradient based on high resolution weights for the backward pass.
This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.
SpikingBrain treats 'spikes' as 1-bit quantization stickers. True neural-level sparsity should be input-dependent, time-resolved, and self-organized during learning. If a new circuit diagram cannot 'grow' with every forward pass, then don't blame everyone for treating it as Another Sparse Marketing - oh wait, Neuromorphic Marketing.
The supposedly dynamic/temporal nature of the model seems to be not applied for GPU execution, collapsing it into a single static computation equivalent to just applying a pre-calculated sparsity mask.
Perhaps a bit cynical of me, but it feels like wrapping standard sparse computing and operator fusion in complex, biological jargon...
There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.
[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...
[2] https://en.wikipedia.org/wiki/MetaX
[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6
If we just look at spikes as a different numerical representation, then they are clearly inferior. For example, consider that encoding the number 7 will require seven consecutive pulses on a single spiking line. Encoding the number in binary will require one pulse on three parallel lines.
Binary encoding wins 7x in speed and 7/3=2.333x in power efficiency...
On the other hand, if we assume that we are able to encode information in the gaps between pulses, then things quickly change.
In the brain the relative timing/ordering of different neurons asynchronously activating (A before B, or B before A) is also used (spike-timing-dependent plasticity - STDP) as a learning signal to strengthen or weaken connection strengths, presumably to learn sequence prediction in this asynchronous environment.
STDP also doesn't imply that spikes or single neuron spike train inter-spike timings are necessary - an activation event with a strength and timestamp would seem to be enough to implement a digital dataflow design, although ultimately a custom analog design may be more efficient.
The brain is doing shit like this.
Also known as a serial interface. They are very successful: PCIe lane, SATA, USB.
SNNs are more similar to pulse density modulation (PDM), if you are looking for an electronic equivalent.
Isn't that in essence very similar to Quantization Aware Training (QaT)?
But I understand that they simulate the spikes as integer events in the forward pass (as described here https://github.com/BICLab/Int2Spike) and calculate a continuous gradient based on high resolution weights for the backward pass.
This seems to be very similar to the straight-through-estimator (STE) approach that us usually used for quantization aware training. I may be wrong though.
https://en.wikipedia.org/wiki/MetaX
They have GPU manufacturers that nobody in the west has ever heard of.