Hacker Newsnew | past | comments | ask | show | jobs | submit | xavriley's commentslogin

I went down a similar rabbit hole at the start of my PhD and I wish I’d written more of it up. One of my theories is that they combined effects quite often. For example, “harder better faster stronger” seems more likely to be a talk box recorded for a single note, then looped, then run through an AutoTune rack unit with MIDI inputs to repitch it. I mention this a little bit in a talk I have at ADC 2022 https://youtu.be/uX-FVtQT0PQ?feature=shared


Thanks for the talk link! I’m going to try writing some harmonizer code next, so your video is right up my alley. I believe IVL’s algorithm also isn’t FFT-based. That makes sense, given the CPU power around on consumer tech at the time.

As for Harder, Better, Faster, Stronger: It’s difficult to know for sure without comments from Daft Punk themselves, but the DigiTech Talker has such a unique, throaty sound, and it’s all over the Human After All album. My confidence varies with my guesses, but Harder, Better, Faster, Stronger is one of the more confident ones, given how distinctive the Talker is. They also used so much DigiTech gear, especially on that album.

Hopefully they’ll see the article and let me know which bits are wrong.


This is cool - there’s some similar work here https://arxiv.org/pdf/2402.01571 which uses spiking neural networks (essentially Dirac pulses). I think the next step for this would be to learn a tonal embedding of the source alongside the event embedding so that you don’t have to rely on physically modelled priors. There’s some interesting work on guitar amp tone modelling that’s doing this already https://zenodo.org/records/14877373


How funny, I actually corresponded with one of the authors of the "Spiking Music..." paper when it first showed up on arxiv. I'll definitely give the amp-modeling paper a read, looks to be right up my alley!

Now that I understand the basics of how this works, I'd like to use a (much) more efficient version of the simulation as an infinite-dataset generator and try to learn a neural operator, or NERF like model that, given a spring mesh configuration, a sparse control signal, and a time, can produce an approximation of the simulation in a parallel and sample-rate-independent manner. This also (maybe) opens the door to spatial audio, such that you could approximate sound-pressure levels at a particular point in time _and_ space. At this point, I'm just dreaming out-loud a bit.


This is possible but very very hard! Actually getting the model to converge on something that sounds reasonable will make you pull your hair out. It’s definitely a fun and worthwhile project though. I attempted something similar a few years ago. Good luck!


This is a hypothesis put forward by Gerald Langner in the last chapter of “The Neural Code of Pitch and Harmony” 2015. I personally think he was on to something but sadly he died in 2016 before he could promote the work


I’m the author of the high resolution guitar model posted in a comment above. I have a drum transcription model that I’m getting ready for release soon which should be state of the art for this. I’ll try to update this thread when I’m done


> In 1912, based on research on B vitamins, Polish biochemist Casimir Funk condensed the term vital amines to vitamines.

Casimir Funk is one of the best names I’ve ever heard


Kazimierz Funk does not roll off the tongue ;) Casimir it is then.


rolls off just fine if you know how to pronounce it


Hydrofoil from Sorrento to Capri in choppy seas, on our honeymoon. Was the stuff of nightmares. My wife said we’d have to live on Capri because she was never setting foot on a boat again


It sounds like you’ve found it already but th original pYin implementation is in the VAMP plugin. Simon Dixon is my PhD supervisor but he’s quite busy. Feel free to email me questions in my the meantime. j.x.riley@ the same university as Simon. There’s also a Python implementation in the librosa library which might have a better license for your purposes.


High latency - agreed but it depends on whether a GPU is available or not. If it is then theoretically CREPE could be real-time. The error rates for pitch recognition are still quite good though for the full CREPE model. I’m interested to see the data on this claim.


Simple techniques like autocorrelation can still recover a missing fundamental. To answer the GP post, using neural networks for this task is overkill for simple, clean signals but it can be desirable if you need a) extremely high accuracy or b) robust results when there are signal degradations like background noise


> how does authorization between the host and the forked work?

On fly.io you get a private network between machines so comms are already secure. For machines outside of fly.io it’s technically possible to connect them using something like Tailscale, but that isn’t the happy path.

> how do I make sure that the unit of work has the right IAM

As shown in the demo, you can customise what gets loaded on boot - I can imagine that you’d use specific creds for services as part of that boot process based on the node’s role.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: