TorchLean: Formalizing Neural Networks in Lean

(leandojo.org)

69 points | by matt_d 2 days ago

2 comments

  • measurablefunc 6 hours ago
    I guess the next step would be adding support for quantized arithmetic.
    • woctordho 21 minutes ago
      It would be good if we can use formal verification to see to which extent the quantization will overflow in intermediate results. There are some widely-known annoying bugs that SageAttention (int8 quantized attention) works on some models but produces black images on other models because of overflow, and currently no one knows how to use it in training. There should be a better way to prevent this.
    • godelski 2 hours ago
      FYI float is already quantized. It isn't continuous nor infinite. Even the distribution of representable numbers isn't uniform (more dense in [-1,1]).
      • measurablefunc 2 hours ago
        The standard definition of quantized arithmetic for neural networks is not the same as the one used for floating point or double floating point values in the IEEE standardization of "real" arithmetic: https://arxiv.org/abs/1712.05877
    • pstoll 6 hours ago
      And the lower precision float variants.
  • westurner 4 hours ago
    How could this lend insight into why Fast Fourier Transform approximates self-attention?

    > Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.

    [1] "Fnet: Mixing tokens with fourier transforms" (2021) https://arxiv.org/abs/2105.03824 .. "Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs" https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-ba...

    "Why formalize mathematics – more than catching errors" (2025) https://news.ycombinator.com/item?id=45695541

    Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs, and do Lean formalisms provide any insight into how or why?

    • wasabi991011 1 hour ago
      > Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.

      Couldn't figure out where you are quoting this from.

      > Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs

      No. The quantum Fourier transform is just a particular factorization of the QFT as run on a quantum computer. It's not any faster if you run it on a classical computer. And to run (part of) LLMs would be more expensive on a quantum computer (because using arbitrary classical data with a quantum computer is expensive).

    • gyrovagueGeist 3 hours ago
      This is just standard Fourier theory of being able to apply dense global convolutions with pointwise operations in frequency space? There’s no mystery here. It’s no different than a more general learnable parameterization of “Efficient Channel Attention (ECA)”
      • godelski 2 hours ago

          > There’s no mystery here.
        
        Yes and no. Yeah, no mystery because for some reason there's this belief that studying math is useless and by suggesting it's good that you're gatekeeping. But no because there are some deeper and more nuanced questions, but of course there are because for some reason we are proud of our black boxes and act like there's no other way