Using AI to Automate Dialogue Animation of 3D Mesh Character Models

I believe I've developed a process to use Artificial Intelligence (AI) to automate the dialogue animation of 3D mesh character models. Let me start with the vision: I want to...

  1. Record an audio track of character dialogue.
  2. Analyse the audio track using speech-to-text artificial intelligence.
  3. Receive speech-to-text results, but with the time offset information for words and phonemes.
  4. Import those encoded results into a Blender 3D animation timeline.
  5. Blender uses those results to match phonemes with timeline.
  6. Mouth shape from character pose library is selected based on phoneme and timeline.

While this sounds like a dream (because it would be), I actually think the pieces for this are already out there. With Google Speech API, I can post my audio file to the AI and get reliable speech-to-text conversion with word confidence scores. If in our Python script, we set:


we get the text results with time offsets for each word. I'm going to check with Google, but I bet there is a debug flag available to get the offsets for time offsets for each individual phoneme. Why can't we use that data to re-associate the words with our timeline in Blender?

Working from the other end of the pipeline, I see a Papagayo product that puts text into mouth shapes, and I see a Blender addon called Automatic Lipsync that puts the Papagayo data into Blender.

Mission: Don't we now have the technology to put ALL of these together into either a addon plugin, or better yet core?

I'm fairly new to Blender, so this is going to be above my skill level - but folks - although ambitious I see no reason why this isn't possible? Where do I begin to make this happen or what would be the most appropriate forum to further the discussion?


A primer on using Google Cloud Speech API and how speech recognition works.

A page with instructions and example Python code for processing audio with time offsets.

The official page for Papagayo.

The official page for Lip Sync Add-on.

And of course, Blender 3D!


