Just tried it with B.E.D - Walk Away[0], unfortunately it lost track of the lyrics after 30 secs (Model is "large-v3"). Will play around a bit more, as it would be great to have a working karaoke generator.
Some quick feedback:
- Needs a way to skip for-/backwards during playback to validate the result
- Sentences seem to be recognized (first letter has uppercasing), but periods aren't added
- Needs an option to edit results from the track analysis
indeed, I'm running to two problems on the analyzer side:
1. align model sliding off (especially w/ chorus/back vocals present)
2. transcript skipping parts of lyrics in lyrics-heavy tracks (I tried a lot of russian rap, lol)
happy for contributions as I'm not that experienced w/ machine learning side of the project, mostly it was emperical "tweak the parameters and look what is changed"
Some quick feedback:
Thanks for keeping it FOSS![0]: https://www.youtube.com/watch?v=_MFT4H3VoNE