I've always wanted to implement a FFT from scratch and play with it to separate ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		atum47 on March 26, 2024 \| parent \| context \| favorite \| on: Hybrid-Net: Real-time audio source separation, gen... I've always wanted to implement a FFT from scratch and play with it to separate audio waves but then a full time job came along. I guess once you separate vocals from everything else you can just feed it to a speech to text? To be completely honest, as a human that does not speak English natively, i find some lyrics hard to understand. I've seen native English speakers also having this problem. I think it's only neutral for a NN to do the same mistakes.

herogary on March 26, 2024 [–]

Source separation is commonly done by applying masks to the spectrogram. Deep learning is used to train the mask masks for different instruments' parameters. As you mentioned, this is the approach we will follow in the subsequent steps.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact