I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters.
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...
Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...