I believe I've developed a process to use Artificial Intelligence (AI) to automate the dialogue animation of 3D mesh character models. Let me start with the vision: I want to...
While this sounds like a dream (because it would be), I actually think the pieces for this are already out there. With Google Speech API, I can post my audio file to the AI and get reliable speech-to-text conversion with word confidence scores. If in our Python script, we set:
enable_word_time_offsets=True
we get the text results with time offsets for each word. I'm going to check with Google, but I bet there is a debug flag available to get the offsets for time offsets for each individual phoneme. Why can't we use that data to re-associate the words with our timeline in Blender?
Working from the other end of the pipeline, I see a Papagayo product that puts text into mouth shapes, and I see a Blender addon called Automatic Lipsync that puts the Papagayo data into Blender.
Mission: Don't we now have the technology to put ALL of these together into either a addon plugin, or better yet core?
I'm fairly new to Blender, so this is going to be above my skill level - but folks - although ambitious I see no reason why this isn't possible? Where do I begin to make this happen or what would be the most appropriate forum to further the discussion?
Resources:
A primer on using Google Cloud Speech API and how speech recognition works.
A page with instructions and example Python code for processing audio with time offsets.
The official page for Papagayo.
The official page for Lip Sync Add-on.
And of course, Blender 3D!
Commenting only available for logged-in users.
Please Support Our Advertisers: |
---|
|