Audio Reactive LED Strips Are Diabolically Hard

(scottlawsonbc.com)

114 points | by surprisetalk 1 day ago

18 comments

doctorhandshake 1 hour ago
I like this writeup but I feel like the title doesn't really tell you what it's about ... to me it's about creativity within constraints.
The author finds, as many do, that naive or first-approximation approaches fail within certain constraints and that more complex methods are necessary to achieve simplicity. He finds, as I have, that perceptual and spectral domains are a better space to work in for things that are perceptual and spectral than in the raw data.
What I don't see him get to (might be the next blog post, IDK), is getting into constraints in the use of color - everything is in 'rainbow town' as we say, and it's there that things get chewy.
I'm personally not a fan of emissive green LED light in social spaces. I think it looks terrible and makes people look terrible. Just a personal thing, but putting it into practice with these sorts of systems is challenging as it results in spectral discontinuities and immediately requires the use of more sophisticated color systems.
I'm also about maximum restraint in these systems - if they have flashy tricks, I feel they should do them very very rarely and instead have durational and/or stochastic behavior that keeps a lot in reserve and rewards closer inspection.
I put all this stuff into practice in a permanent audio-reactive LED installation at a food hall/ nightclub in Boulder: https://hardwork.party/rosetta-hall-2019/
[-]
- scottlawson 49 minutes ago
  I didn't go into much detail about it but there's a whole rabbit hole of color theory and color models. For example, the spectrum effect assigns different colors to different frequency bins, but also adjusts the assignment over time to avoid a static looking effect. It does this by rotating a "color angle" kind of like the HSL model.
  I really like your LED installation in Rosetta Hall, it looks beautiful!
  [-]
  - doctorhandshake 12 minutes ago
    Thanks! Great article - would like to read one about the color rabbit hole pls ;)
- PaulHoule 52 minutes ago
  Yeah, "diabolical" overstates it. It isn't a wicked problem
  https://en.wikipedia.org/wiki/Wicked_problem
  Kinda funny but I am a fan of green LED light to supplement natural light on hot summer days. I can feel the radiant heat from LED lights on my bare skin and since the human eye is most sensitive to green light I feel the most comfortable with my LED strip set to (0,255,0)
  [-]
  - scottlawson 44 minutes ago
    I'd actually argue it has some wicked problem characteristics. The input space is enormous (all possible audio), perception is subjective and nonlinear, and there's no objective function to optimize against, only "does this feel right?". Every solution you try reframes what "good" means. It's not as hard as social planning but is way harder than it sounds, no pun intended.
    [-]
    - PaulHoule 7 minutes ago
      Ever seen https://www.youtube.com/watch?v=oNyXYPhnUIs ? There are a lot of things people might think feels right.
WarmWash 41 minutes ago
The real killer is that humans don't hear frequencies, they hear instruments, which are a stack of frequencies that roughly sometimes correlate with a frequency range.
I wonder if transformer tech is close to achieving real-time audio decoding, where you can split a track into it's component instruments, and light show off of that. Think those fancy Christmas time front yard light shows as opposed to random colors kind of blinking with what maybe is a beat.
milleramp 45 minutes ago
This guy has been making music controlled LED items, boxes and wrist bands. https://www.kickstarter.com/projects/markusloeffler/lumiband...
menno-dot-ai 1 hour ago
Woow, this was my first hardware project right around the time it released! I remember stapling a bunch of LED strips around our common room and creating a case for the pi + power supply by drilling a bunch of ventilation + cable holes in a wooden box.
And of course, by the time I got it to work perfectly I never looked at it again. As is tradition.
[-]
- scottlawson 42 minutes ago
  That's awesome to hear! Sometimes the journey is the destination, its a great project to get started with electronics.
mdrzn 3 hours ago
Always been very interested in audio-reactive led strips or led bulbs, I've been using a Windows app to control my LIFX lights for years but lately it hasn't been maintained and it won't connect to my lights anymore.
I tried recreating the app (and I can connect via BT to the lights) but writing the audio-reactive code was the hardest part (and I still haven't managed to figure out a good rule of thumb or something). I mainly use it when listening to EDM or club music, so it's always a classic 4/4 110-130bpm signature, yet it's hard to have the lights react on beat.
JKCalhoun 2 hours ago
I made a decent audio visualizer using the MSGEQ7 [1]. It buckets a count for seven audio frequency ranges—an Arduino would poll on every loop. It looks like the MSGEQ7 is not a standard part any longer unfortunately.
(And it looks like the 7 frequencies are not distributed linearly—perhaps closer to the mel scale.)
I tried using one of the FFT libraries on the Arduino directly but had no luck. The MSGEQ7 chip is nice.
[1] https://cdn.sparkfun.com/assets/d/4/6/0/c/MSGEQ7.pdf
[-]
- empyrrhicist 2 hours ago
  Have you ever seen anything like a MSGEQ14 or equivalent? It would be cool to go beyond 7 in such a simple-to-use chip, but I haven't seen one.
wolvoleo 56 minutes ago
Thanks for this! Exactly the thing I'm struggling with now. Making decent visualisation for music based on ESP32-S3.
londons_explore 2 hours ago
The mel spectrum is the first part of a speech recognition pipeline...
But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?
Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?
An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.
[-]
- calibas 2 hours ago
  I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.
iamjackg 1 hour ago
Scott's work is amazing.
Another related project that builds on a similar foundation: https://github.com/ledfx/ledfx
panki27 2 hours ago
Had a similar setup based on an Arduino, 3 hardware filters (highs/mids/lows) for audio and a serial connection. Serial was used to read the MIDI clock from a DJ software.
This allowed the device to count the beats, and since most modern EDM music is 4/4 that means you can trigger effects every time something "changes" in the music after synching once.
[-]
- JKCalhoun 2 hours ago
  "3 hardware filters…"
  The classic "Color Organ" from the 70's.
rustyhancock 4 hours ago
More than 20 years ago or so I made a small LED display that used a series of LM567 (frequency detection ICs) and LM3914 (bar chart drivers) to make a simple histogram for music.
It was fiddly, and probably too inaccurate for a modern audience but I can't claim it was diabolically hard. Tuning was a faff but we were more willing to sit and tweak resistor and capacitor values then.
IshKebab 14 minutes ago
It's not that hard. I did a real-time version of the Beatroot algorithm decades ago that worked pretty well for being such a simple algorithm.
8cvor6j844qw_d6 3 hours ago
Are these available commercially for consumers?
p0w3n3d 4 hours ago
IANAE but I would go for electric circuit, not electronic software that steers the led. I think that nowadays, with the LLM support it can be easier and better to optimise it for the sake of latency.
[-]
- mrob 3 hours ago
  If you want minimum latency, you want the input side of an traditional vocoder, not an FFT. This is the part that splits the modulator signal into frequency bands and puts each one through an envelope follower. Instead of using the outputs of the envelope followers to modulate the equivalent frequency bands of a carrier signal, you can use them to drive the visualizer circuit.
  That can be done with analog electronics, but even half an analog vocoder needs a lot of parts. It's going to be cheaper and more reliable to simulate it in software. This uses entirely IIR filters, which are computationally cheap and calculated one sample at a time, so they have the minimum possible latency. I'd be curious if any LLM actually recognizes that an audio visualizer is half a vocoder instead of jumping straight to the obvious (and higher latency) FFT approach.
- avisser 1 hour ago
  For recorded music, you could always buffer however many milliseconds of audio to account for the processing.
askl 4 hours ago
Interesting. I'm currently in the process of building something with a audio reactive LED strip but didn't come across this project yet. The WLED [1] ESP32 firmware seems to be able to do something similar or potentially more though.
[1] https://kno.wled.ge/
Edit: Oh wait, that project needs a PC or Raspberry PI for audio processing. WLED does everything on the ESP32.
[-]
- MrBuddyCasino 1 hour ago
  WLED is decent but tbh the lag is very noticeable. Did you compare to this python thing?
  [-]
  - askl 30 minutes ago
    No, haven't tried it.
    For my use case I want something fully portable and battery powered anyways. So the audio stuff should happen on the ESP32. (Or on my phone, that might work too)
- stavros 4 hours ago
  Yeah WLED does it fine, I've built a few and it works well.
- turbine401 4 hours ago
  [dead]
mockbolt 1 hour ago
[flagged]
[-]
- isoprophlex 1 hour ago
  Are you using multiple accounts to post the same comment?!
kbouck 2 hours ago
[flagged]
m3kw9 1 hour ago
how is it hard, do a A to D, add a filter, do compute, then do D to A.
[-]
- kennywinker 1 hour ago
  Not hard to do, hard to do well. Hiding all complexity with a hand wavey “do compute” doesn’t make that bit easy
  [-]
  - m3kw9 1 hour ago
    Yeah i get it, the details are hard.
- cogman10 1 hour ago
  The article covers that.
  In short, audio and visual perception do not map perfectly. Humans don't have a linear perception of either so a perfect A to D then D to A conversion yields unsatisfying results.