Using AI to Automate Dialogue Animation of 3D Mesh Character Models : CleverPeople.com

By Gary Wright II7 Nov 2017 - Edited519 views

animationblenderartificial intelligenceblender3dpythonsoftware developerengineeringAIspeech to text3Dcomputer graphics

I believe I've developed a process to use Artificial Intelligence (AI) to automate the dialogue animation of 3D mesh character models. Let me start with the vision: I want to...

Record an audio track of character dialogue.
Analyse the audio track using speech-to-text artificial intelligence.
Receive speech-to-text results, but with the time offset information for words and phonemes.
Import those encoded results into a Blender 3D animation timeline.
Blender uses those results to match phonemes with timeline.
Mouth shape from character pose library is selected based on phoneme and timeline.

While this sounds like a dream (because it would be), I actually think the pieces for this are already out there. With Google Speech API, I can post my audio file to the AI and get reliable speech-to-text conversion with word confidence scores. If in our Python script, we set:

enable_word_time_offsets=True

we get the text results with time offsets for each word. I'm going to check with Google, but I bet there is a debug flag available to get the offsets for time offsets for each individual phoneme. Why can't we use that data to re-associate the words with our timeline in Blender?

Working from the other end of the pipeline, I see a Papagayo product that puts text into mouth shapes, and I see a Blender addon called Automatic Lipsync that puts the Papagayo data into Blender.

Mission: Don't we now have the technology to put ALL of these together into either a addon plugin, or better yet core?

I'm fairly new to Blender, so this is going to be above my skill level - but folks - although ambitious I see no reason why this isn't possible? Where do I begin to make this happen or what would be the most appropriate forum to further the discussion?

Resources:

A primer on using Google Cloud Speech API and how speech recognition works.

A page with instructions and example Python code for processing audio with time offsets.

The official page for Papagayo.

The official page for Lip Sync Add-on.

And of course, Blender 3D!