Walking and Talking

by Ron Gilbert
Aug 31, 2015

Quick blog post today.

Spent much of the weekend and today rewriting my engine's animation system to give it more flexibility for doing walking, talking and other multi-layered animations.

The main issues is wanting to play a talking animation, which moves the mouth and head, but then wanting to play other animations, like walking, pointing, shrugs and other gestures. If every animation has to have a version with talking, the permutations quickly makes you want to bang your head repeatedly against your desk.

SCUMM had a great animation system called Byle. It could play several animations simultaneously, and as long as they used different layers it would play them all at once. It made doing things like playing a talking animation and then playing a shrug really easy. It was also easy for the artist, because they would just have an attach point for the head and could move the body as needed.

This stuff is pretty routine for 3D animation systems or 2D animation systems like Spine, but for pure bitmap graphics, I have yet to find a good one. So, as often happens, when I can't find a tool that works, I just say fuck it, I'll write my own.

Gary started by the chunking of Reyes' animations into head, body and mouth layers and we went through several iterations of how to best layer them for maximum flexibility and (more importantly) ease of creation.

Our system also allows for the option of lip-syncing, if we decide to do that. The system understands the basic vowel positions and can set the mouth frame based on external input. Right now it's just a weighted random number, but if we had lip syncing data, it would be fed in instead.

I don't know if we'll do lip syncing, it's an amazing amount of work unless it's automated and I haven't looked into the current state of automated lip-syncing. Eight years ago it was crap, but we now live in a future of self-driving cars and staying in strangers houses instead of hotels, so who knows what other crazy things have been invented.

If you know of any software that can pre-process audio files and produce a lip-sync track, let me know. It doesn't have to do it real-time, it can (and probably should) be a pre-processed staged.

Gary and I also decided to do head bobbing along with mouth movement, but reduced from what was in Monkey Island. Back at Lucasfilm, when we went from the large headed characters to more realistically proportionally sized heads, the mouth became a single pixel and it was hard to tell if Guybrush was even talking.

With the larger stylized heads in Thimbleweed Park, it's easy to tell if they are talking, but having no movement of the head felt too static, so the plan we ultimately fell on was to have two head positions; normal and up. 80% of the time, the head is in the normal position and will randomly go to the up position 20% of the time. I think it works well, but I'm sure we will endlessly tweak it in the coming months.

As with a lot of what we post, what you're seeing is not final animation, just something we put together as a test, so don't nitpick. There is still a lot of work to do.

- Ron

P.S. IGNORE THE ICONS!

Mattias Cedervall - Aug 31, 2015 at 22:01

Great work, Ron! :-)

vegetaman - Aug 31, 2015 at 22:12

Looks neat Ron, but one question on the head movement -- why "up" instead of "down"? Just curious.

Ron Gilbert ✓ - Aug 31, 2015 at 22:14

We did both and down looked odd due to the chin.

Josejulio Martínez - Aug 31, 2015 at 22:34

Maybe this could be something to start with for the text to lip-synch http://www.cs.dartmouth.edu/reports/TR2004-501.pdf

Mikael Karlsson - Aug 31, 2015 at 22:40

Ron, you and your team should add bobblehead figurines that are similar to the characters of Thimbleweed park, so we can spend more cash on this awesomeness (And just to make a quick shout out to Dave Miller. Cash would be nice!)

Steve - Aug 31, 2015 at 23:58

Are the characters going to blink?

Ron Gilbert ✓ - Sep 01, 2015 at 11:06

Yes, they will blink. There will also be fidget animations when they are standing, plus gestures while talking. This is just first pass.

Steve - Sep 01, 2015 at 18:24

Cool! Can't wait to see more - keep up the great work :)

patoland - Sep 01, 2015 at 01:07

Hi Ron, check this out!
http://www.lostmarble.com/papagayo/index.shtml
is a free Lip-syncing software made in Python for windows and mac...but maybe Python is not suited for your needs...
amazing blog! Tnx!

Soong - Sep 01, 2015 at 05:19

Papagayo is awesome! I don't think it matters that it's written in Python because it would not become part of the game. It would only be used to produce lip-syncing data that can then be read by Ron's engine.

Mischa Magyar - Sep 01, 2015 at 01:27

The lip movement in the video seems rather fast. Will the characters have different talking speeds? :-)

Martin -enthusi- Wendt - Sep 01, 2015 at 01:28

Maybe you can get along with parsing text line on the fly and then react on vowels? For mouth open/shut (which I happen to know a small 8bit project does) but then also for the occasional headbump (which I never missed actually before).
I would figure your head-bumpiness-likelihood increases during saying 'a' rather than 'm' ?
SURELY everyone will notice. Still, it makes you feel good.

Bogdan Barbu - Sep 01, 2015 at 04:09

That doesn't cut it because the pace of human speech is unpredictable and they want to sync the animation with the *voice acting*, not the text parsed at some pace.

Davide - Sep 01, 2015 at 02:37

facefx.com

tomimt - Sep 01, 2015 at 03:42

I think that might be a bit of an overkill

Davide - Sep 01, 2015 at 17:13

It is the only tool which I know that takes batches audio data + transcript without
any special annotations and produce good output. Of course, they would only use
the output metadata (or use some custom code to produce data, it is open source for licensee),
without needing the 3d part. The actual benefit is in the audio processing.

Davide - Sep 01, 2015 at 17:20

There are more focused alternatives,
http://www.annosoft.com/lipsync-sdks
but I've no idea of their quality.

Geoff Paulsen - Sep 01, 2015 at 03:09

Keep up the hard work. I can't wait to play with the system years from now. (you're still planning on open sourcing it right?)

Mario - Sep 01, 2015 at 03:27

@RON: how will the speeching look (head up & down) if you see the character from the side? or can he only talk with the face faced to our screen?

Barney Gumble - Sep 01, 2015 at 03:29

I like it!

Daniel Wolf - Sep 01, 2015 at 03:40

A few years ago, I had a look at a number of tools that claimed to extract lip-sync data from sound files. Back then, none of them really convinced me. As a matter of fact, the best automated lip-sync I know of was (don't laugh!) in the old Sierra adventures.

Look at this video of King's Quest VI from 1992: <https://youtu.be/fEUFzSb5utk>;. There's a lot of dialog right at the start. From what I understand, all the lip-sync data was extracted automatically from the voice recordings. I'm not sure if the company (Bright Star Technology) is still around, but you might try contacting Elon Gasper, who founded the company and apparently still holds the patents. <https://www.linkedin.com/in/elongasper>; Even if you can't use their technology, he might have an idea as to the current state of the art.

LogicDeLuxe - Sep 01, 2015 at 08:28

I'm not a fan of those pop up close ups. I prefer full screen close ups like it was done in Loom and Monkey 1. I suspect disk space was the reason, they don't move their lips at all in close ups, right?

Zak Phoenix McKracken - Sep 01, 2015 at 03:49

Uhmm... I don't know, but in my opinion, the head movements look strange.
Maybe I'm too tied to Maniac Mansion style: big head, only mouth movement.
I think that something like the "version 2" of Maniac Mansion should be enough: mouth movement, but fixed head. When the character is side-view, it can move its jawbone up and down.

jokie - Sep 01, 2015 at 03:52

Looking at that animation I'd say lip sync will probably not be worth the effort.

Besides, there is something retro about the characters randomly moving their mouths when talking. I like it. :)

Bogdan Barbu - Sep 01, 2015 at 04:20

Now that you have this animation, perhaps you can throw eye blinking in. You should perhaps leave it out of the talking animation because people usually blink only between sentences and for that you have to worry about lip syncing again. However, it could at least be done for idling and walking if it's not just a waste of resources.

Regarding the talking animation, I think the mouth looks better than I was expecting but the head bobbing is turning out to be not-so-great. Although I theoretically understand the animation, to me it doesn't register as slightly changing the orientation of the head but as if something weird and disturbing is happening to it.

Bogdan Barbu - Sep 01, 2015 at 04:21

s/this animation/this animation system/

Sorry.

Zak Phoenix McKracken - Sep 01, 2015 at 04:31

Look at Edit button : "WOW! It could be useful!"
Push Edit Button : "It doesn't seem to work."

Bogdan Barbu - Sep 01, 2015 at 08:25

When idling, they could also do other thinks like pick their noses, do push-ups (because who would do that?), accidentally drop things in the inventory and quickly pick them up, etc.

Dan - Sep 01, 2015 at 12:58

No disgusting animations like nose picking, please! Blowing the nose would be better.

Bogdan Barbu - Sep 01, 2015 at 19:19

There's a corpse in that puddle and you're afraid of some nose picking? Besides, nose picking is funny because it's socially inappropriate™ (even though most of us do it when we're alone). That's why people love a good fart joke---or at least they used to until it started being overused. Blowing your nose isn't funny. NPC's could notice and yell things like "Hey, stop that!" etc.

KJL3000 - Sep 01, 2015 at 05:03

Lip Sync™ +1 ... would this work in every supported language?

Bogdan Barbu - Sep 01, 2015 at 07:06

Sure, since it rely on the sounds, not on their meaning. There are a few techniques that could work with some supervision and maybe tuning, such as hidden Markov models, neural networks, etc. I've done things like this in the past. Unfortunately, I don't know of a library that would do this (neither have I looked) and it seems like much work work than it's worth to implement one for a project like this.

Bogdan Barbu - Sep 01, 2015 at 07:08

* If we all sneak enough mistakes in, they might add that Edit facility. :p

Dan - Sep 01, 2015 at 13:31

I don't think so, since only an english voice recording is announced up to now. Only the subtitles will be translated to some other languages.

Bogdan Barbu - Sep 01, 2015 at 19:21

Good point. But it should work in principle, at least.

Thomas Oger - Sep 01, 2015 at 05:22

A company I used to work with for animated TV series both 2d and 3d: http://www.syncmagic.com/
I think they do English, French, German, Spanish and Italian.
They used to deliver .anim files for our pipeline but you can really ask them for any kind of outputs.

Might be worth a try.

Dan - Sep 01, 2015 at 06:05

I like it.

How about keeping the ears in a constant position in order to prevent the impression of a stretching neck.

Besides you could individualize the different characters. For example some smart-alec people use to raise their eyebrows, nervous people use to blink in a high frequency when they are talking. And narcissistic people often smile while they are speaking.

Zak Phoenix McKracken - Sep 01, 2015 at 06:17

Like these characters of Ace Attorney™ (Nintendo):
http://www.court-records.net/animation/armstrong-sweating%28b%29.gif
http://www.court-records.net/animation/moe-tsk%28b%29.gif
http://www.court-records.net/animation4/brushel-smile%28b%29.gif

Mattias Cedervall - Sep 01, 2015 at 10:48

You may be right about the ears.

Man with a duck - Sep 01, 2015 at 06:28

I think the big-headed look is working pretty well in this particular scene.

Alex - Sep 01, 2015 at 06:32

Looks great! So exciting to see :-)

If it was just up to me, I wouldn't worry about the lip-synching - it's possible it won't add much to the game at the cost of too much work developing it unless there's something that will easily work against the audio files.

As a middle-ground I might be tempted to have two versions of the speech text - one version that's displayed, and another one, that could be configured with a couple of extra codes (like "[speed=1] or "[pause=10]", etc to be parsed with the animation), for pacing at more pertinent or repeated phrases and sentences.

DorHajaj - Sep 01, 2015 at 06:40

It's time to tell you the truth Ron, i LOVE the ICONS!!!!!
There, I said it!

Pieter - Sep 01, 2015 at 06:58

I love all this information and inside look in making videogames but Ron it makes me wanna play this game tomorrow. IT LOOKS AWESOME!!

Grafekovic - Sep 01, 2015 at 07:06

"There is still a lot of work to do."

Hurry up, just nine months left ;-)

Natalija - Sep 01, 2015 at 07:13

The first icon is the badge and the second is....? A book? It looks odd.

Mattias Cedervall - Sep 01, 2015 at 10:29

What icons? I don't see any icons.

Christian - Sep 01, 2015 at 07:27

Hmm...it looks strange to me...and a lot of head bobbing. Maybe the idea with the ears is a good one (see a comment above) or decreasing the distance the head covers (looks like quite a big movement right now).

Marco Lizza - Sep 01, 2015 at 08:20

Since the very beginning of the Thimbleweed Park project, I almost instantly loved any output that Ron, Gary, David and Mark provided to us.

But this time, quite frankly, I think that the head bobbing doesn't fit at all. I would rather prefer the character to blink or perform any other "tiny" movement. The head bobbing appears way too exaggerated to me, and in normal life nobody extrude his/hers head so much when talking.

Jammet - Sep 02, 2015 at 10:42

In Monkey Island, characters used to nod and tilt their heads a lot whole talking - and that looked good to me. This here is reduced from that, but it's simply the head going up, it's not "noddling" or tilting.

Marco Lizza - Sep 03, 2015 at 10:14

Yes, you are right. In fact, what Ron showed us is more neck/head protrusion than "nodding".

Guess they'll adjust this as the proceed...

Ron Gilbert ✓ - Sep 03, 2015 at 10:25

As I've said many many many times, you are getting a look at how games are actually made, not some PR washed look. The way games are made is you try something, live with it for a while and tweak it and teak it and tweak it. Things do not come out fully formed, they never do. This is the process. We're not happy with the head bob, I've said that before, but it's going to stay and we'll tweak it, but not now. We want to live with it for a while. Please respect the fact that we are showing you crude first hacks at what we're doing. Monkey Island looked like crap for months and months. Every game does.

Marco Lizza - Sep 03, 2015 at 20:11

> Please respect the fact that we are showing you crude first hacks at what we're doing. Monkey Island looked like crap for months and months. Every game does.

I really respect your dedication and work, Ron. Also, I really appreciate that your are sharing with us your early and crude experimentations.

That was just my personal consideration and feedback on "how" to head movement looked.

I didn't mean to be unrespectful, at all. In fact I exactly know how software (and game) "continuous refinement process" works. I should say that I adopt your exact approach in developing software since a long time.

We have just to be patient, yep.

tomimt - Sep 02, 2015 at 14:23

Keep in mind that what you see on the video is just a work in progress prototype, not a finished thing.

Marco Lizza - Sep 03, 2015 at 10:11

Well, this i quite obvious to us all. However, I feel that giving a feedback of this could be appreciated, especially if we *don't* like something.

Charles - Sep 01, 2015 at 08:31

Lip syncing is completely unnecessary and resource intensive.
Just dial back the head bob, add random blinking, and move on.

Mattias Cedervall - Sep 01, 2015 at 10:49

Random blinking sounds good.

Ron Gilbert ✓ - Sep 01, 2015 at 11:03

There won't be lip-syncing for text only because people read at different speeds, but when there is audio, lip-syncing is very important. It really makes a huge difference. If I can find a tool that lip-syncs, it is just a batch process and isn't that resource intensive.

Mister T - Sep 04, 2015 at 14:44

Hmm, maybe the bobbing could be synced as well. It is difficult to judge when it is silent, but when there is a melody in the voice, the bobbing might make sense when it raises or just at the beginning of a sentence or depending on punctuation.

Ashley B - Sep 10, 2015 at 23:31

Does this mean any foreign language versions will require lots of lip-rework alongside translations?

Bogdan Barbu - Sep 01, 2015 at 12:36

The lip syncing wouldn't happen in real time but would be preprocessed. You may have seen similar things without knowing it. For instance, you may have seen CGI movies, like Finding Nemo or whatever. When you watch something like that, your computer isn't running a real-time ray tracer in the background---the rendering happened in a studio and what you see is a prerendered version.

LogicDeLuxe - Sep 01, 2015 at 08:38

The bobbing looks strange to me too. I think, I would prefer some gestures instead. Gestures worked very well in Monkey 2.
I very much like the mouth animation though. It is so much more detailed than those in Maniac Mansion.

Derrick Reisdorf - Sep 01, 2015 at 10:48

...I'm not sure how I feel about the head bob.
I know it would take more work, but perhaps a slight head nod and/or wobble and moving jaw would be less jarring. This head bob should be used only in rare occasions where extreme emphasis (or coughing or hiccupping) is required.
Otherwise, I would say just "86" the head bob.

Derrick Reisdorf - Sep 01, 2015 at 10:49

...or is this going to look completely different in the end, and we're just nitpicking?

Ron Gilbert ✓ - Sep 01, 2015 at 11:08

Right now the head is bobbing 1px, it can't get more 'slight' without a lot of work. As with everything, we live with it for a while and tweak and change.

Derrick Reisdorf - Sep 01, 2015 at 15:31

Without adding a bunch of new head sprites (head facing front slightly tilted up, head facing front slightly tilted down, head facing front slightly to the side, etc.) to make it seem more of a "nod" than a "bob", I think we could reduce the frequency of the nod quite a bit. Only use it when real emphasis is needed?
If hooked up with lip sync program, I wonder if you could implement the bob based on inflection/volume of the voice...?

Phaze - Sep 02, 2015 at 15:22

I think the head bobbing atm is a bit extreme. TBH it doesn't look as good as everything else does. Now i realise it is just a work in progress, but I think it should be a lot more subtle. Just my 2 cents.

Otherwise the scene looks fantastic! I especially love the final icons. ;)

Roman - Sep 01, 2015 at 11:32

well...nice to see the talking...good work...regarding the head movement..well...(ok ok it will be optimized in the future)...but at the moment it looks like having hiccups.....or just a gfx glitch...

urielz - Sep 01, 2015 at 12:50

Ok, I would say lip-syncing doesn't really matter for this type of game... but I'll go along and trust Ron if he says lip-syncing is very important and makes a huge difference.

I know it's a first pass, but as others said, I find the head movement not very natural.

Love the art of that 'room'!

Derrick Reisdorf - Sep 01, 2015 at 15:39

If it would be REALLY ODD if the mouth doesn't animate to match the recorded voice.

urielz - Sep 01, 2015 at 19:23

Of course, but there's a big gap between no animation and lip-syncing.

Derrick Reisdorf - Sep 02, 2015 at 14:50

I'm not sure what you're trying to say.

Let me preface what I'm about to say by stating:
1) The characters' mouths will be animated in the game
2) The game will include recorded audio of voiced dialogue.

Ron feels like the mouth animation should be lip-synched to the audio recordings of voiced dialogue.
Instead of manually trying to make the animations match the voice recordings, he wanted to find an automated lip-synching solution that would do the work for him.
If the mouth animations were not synced with the voice recordings, that would be very odd.
IF there was no voice planned for the game, then lip-synching wouldn't matter very much (since it would be difficult to gauge the speed at which the player reads the text, anyway).

urielz - Sep 02, 2015 at 15:21

Yes, I understand what’s the goal here. All I’m saying is that I think that it might suffice having voice over random mouth movement, as long as the mouth starts/stops moving with the audio. Since the mouth is composed of a dozen pixels I don’t think it would matter much if lip-syncing is not implemented, but I could be totally wrong :)

Peter Campbell - Sep 01, 2015 at 14:01

Is that 2nd icon a welding mask?

"P.S. : Ignore the icons!"

Oh crap, sorry!

Betta - Sep 01, 2015 at 14:09

Mhmm...the head movement looks a bit to weird to me

Cazzeris - Sep 01, 2015 at 14:18

The head-bobbing doesn't look that unnatural in my opinion, but the mouth animations are definitely weird. When the character isn't talking, the mouth is a black line which doesn't resemble a pair of closed lips as properly as Maniac Mansion did. The thing gets actually awkward when Antonio starts talking and this line turns into an unidentifiable black mess that looks nothing like a mouth. Actually, I think that the character art in Maniac Mansion felt a bit more detailed in a way that the protagonists were easily recognisable. Let's hope all the animations TP will have to make the game's characters feel a bit more "alive" since they look somewhat grumpy at the moment.

The environmental art looks as great as expected, even though it'd be better if the water was animated Woodtick-style.

Mattias Cedervall - Sep 01, 2015 at 15:19

I agree about the lips, but it's not final art.

Derrick Reisdorf - Sep 01, 2015 at 15:37

I think it looks fine. It looks like a closed mouth with the shadowed bottom lip.

Flo - Sep 01, 2015 at 15:47

Did you have a lokk at Sphinx? I only used it for recognition of simple voice commands, but theoretically it should be able to regcognize phonemes and their timing, too:
http://cmusphinx.sourceforge.net/wiki/phonemerecognition

Sushi - Sep 01, 2015 at 15:51

Just a thought you could try during your endless tweaking: what if you change the random bop position from 20% to something a bit more complex such that when the head is moved up, it stays for at least a few frames in that position. And similar for returning to the normal position. I think the one-frame-bobbing is causing the seasickness.
Alternatively (quicker and shorter to write) you could just assign 20% to *toggle* the head position. That reduces a single bop to 4%.

Mattias Cedervall - Sep 01, 2015 at 16:04

I agree with you because I'm sensitive to seasickness. :-(

Sushi - Sep 01, 2015 at 16:38

Seasickness?

As a fish, I meant to say homesickness!

<drum roll>

Mattias Cedervall - Sep 01, 2015 at 16:54

I understand, Sushi-san.

Stefano E. - Sep 01, 2015 at 19:22

I personally really like it, can't wait to listen to it!

hihp - Sep 01, 2015 at 21:18

My two cents:

Head bobbing looks a bit weird to me, but your mileage may vary. How about adding an option to the settings so you can turn it on and off?

Apart from that: I would not make it truly random, as it looks strange if in two iterations of the same sentence, Reyes does different head-bobbing. Make it pseudo-random by using some sort of checksum of each displayed line as the seed. That way, every line will produce different head-bobbing, but every line will always create the same bobbing pattern.

Derrick Reisdorf - Sep 02, 2015 at 15:03

Like I had said, the head bob isn't too bad if you want the character to really emphasize what s/he is saying (or possibly be raising his/her voice). So, it shouldn't be as frequent as we see in the video.

In this example, only do the head bob for the words that are capitalized:
I need to go to the bathroom.
We'll be in town in about 5 minutes.
I mean I REALLY need to go to the bathroom.

It's "rank AND file", not "rank IN file"... idiot.

RainerG - Sep 01, 2015 at 22:46

The head bobbing looks strange to me too. Especially the second time "Look at badge", when the head moves 7 times. The first "Look at badge" with 3 head movements is far better.

Ron Gilbert ✓ - Sep 01, 2015 at 23:22

Due the harsh critical response we've gotten to the bobbing heads, we're going to cut the bobbing heads. Actually, we're just going to cut heads altogether.

http://images.thimbleweedpark.com/noheads.png

hihp - Sep 01, 2015 at 23:43

You missed the grad in the corpse!

hihp - Sep 01, 2015 at 23:44

*head

Derrick Reisdorf - Sep 02, 2015 at 00:22

Use tuna with headless neck.

Zak Phoenix McKracken - Sep 02, 2015 at 03:41

It's sooooooooooo romantic... The detective has lost his head because he fell in love....

longuist - Sep 02, 2015 at 03:58

heads up. cut!cut!cut!
I'm really looking forward to a proper trachea/esophagus animation.

Bogdan Barbu - Sep 02, 2015 at 04:27

You're starting to sound like my girlfriend.

Bogdan Barbu - Sep 02, 2015 at 04:28

Sorry, that's inappropriate but there's no Edit button. :p

Mario - Sep 02, 2015 at 05:48

YES! THE CHAINSAW WORX!!!!!!!!!!!!!!

Grafekovic - Sep 02, 2015 at 06:46

The corpse looks like Weird Ed. Why did you kill him?

Mattias Cedervall - Sep 02, 2015 at 15:42

Here's a big and warm hug from me to you, Ron! ;-)

Arto - Sep 03, 2015 at 12:46

I think this is a good solutions, as it solves not only the problem with boobing (made a typo but will leave it as it is...) but also the challenges with lip syncing and blinking eyes.

If the characters would be invisible, then you wouldn't need to spend time animating at all.

ekt - Sep 02, 2015 at 03:14

Not in love with that head bobbing too. Seems some severe hiccup attack. To me its because the head comes back to the initial position too quickly. Also maybe 1 pixel is too much (but would subpixel work for your art?).
The background tough. The lighting. Oh my goddess of palettes. Its wonderful.

Dan - Sep 02, 2015 at 05:28

The hiccup can be medicated by just reducing the speed of the bobbing. In my opinion the restrictions of the low resolution should be complied. If you want the unadulterated feeling of a classic adventure game from the early nineties, you are forced to do so. And I like it.
If you used subpixels, it would result in a kind of soap opera effect, which would mean that the moving object sets itself apart from the surroundings.
The only reasonable way to make it look more smooth is to use 2 pixels instead of one. Though Ron announced to make it reduced from how it was in Monkey Island, so he makes use of a single pixel. I understand that. On the other hand the heads are bigger this time, so the proportion of 2 px looks smaller relating to the head. Therefore 2 px could be still a solution.

Dan - Sep 02, 2015 at 06:22

I have to clarify that I'm okay with 1 px. I wrote "solution" just because of the critical response here.

ekt - Sep 03, 2015 at 08:55

Dan, I do agree subpixel would be wrong

Daniel Wolf - Sep 02, 2015 at 04:33

I did some more digging into automatic lip-sync software and I found something that seems to fit your bill perfectly: Annosoft offers a lip-sync SDK that works both with C++ and through ActiveX. You give it a sound file and it returns raw lip-sync data in a simple format. You can also give it a dialog string *in addition* to the voice recording and tell it what language it's in. The system will then use the dialog text to create better mouth positions.

They have a free demo application that demonstrates the SDK. I just tested it with some voice recordings and it seems to do a decent job. The results tend to look a bit mechanical, but that may be less of a problem when the mouths are only a few pixels big.

The pricing is US$ 3000 to 3500 per game. Might be worth checking out: http://www.annosoft.com/lipsync-sdks

Bogdan Barbu - Sep 02, 2015 at 06:28

*Papagayo* seems to do pretty much the same but for *free*.

Ron Gilbert ✓ - Sep 02, 2015 at 10:48

$3000!. OK, that's totally not going to happen. Too bad, it looks like a nice solution.

Bogdan Barbu - Sep 02, 2015 at 18:44

What about the one I offered?

Ron Gilbert ✓ - Sep 03, 2015 at 10:31

It looks too labor intensive. I need a process were I feed a bunch of audio files in and it spits out lip-syncing data. There are way too many lines of dialog to hand tweak them, or even load and save each one. It's got to be a batch process. If I can't find one, then there won't be lip-syncing. Lip-syncing is important, but it's not worth spend thousand of dollars on, we don't have the money in the budget..

jokie - Sep 15, 2015 at 12:30

The is a product called DarkVOICES that can convert a batch of wav:s to lipsync data and it costs $50. That should be affordable :)

I can't vouch for the quality though, I haven't tried it.

Oliver - Sep 02, 2015 at 06:59

I love the head bobbing and the huge heads and i would love to have lip syncing with the voice audio too.
Everything looks absolutely great!
This is the first kickstarter project iam in, at which everything that is shown from the game during the developement process makes me more and more certain that i had not only made the right decision supporting it, i think you guys are doing it absolutely right.
I noticed that there are people criticising and giving hints (the the lip syncing or the head bobbing), at first i thought they have to be mad or something but then i noticed that this, like most other topics, will polarize the people. Which means there is absolutely NO SENSE in givng suggestions because for every strong opinion there is at least another strong oposite opinion.
So i suggest every one should stop giving any suggestions from now one, everything else would be equal to the self declaration of beeing irrational, which would render any given statement worthless.

Thanks bye.

Zak Phoenix McKracken - Sep 03, 2015 at 07:31

If my memory is still good enough, this is the first time that so many people "criticize" an implementation (and that is not in its final state, but still in development).
I think these comments are only signals, a sort of poll among the most valuable audience in adventure games.
They can be ignored, or considered by the authors, in order to pursue what they think is the best possible final result.

Dan - Sep 03, 2015 at 08:08

I think every suggestion has its right to exist, given that it's a suggestion for improvements.
Any criticism should be factual and constructive. There is no place for sheer nagging.
The team should keep cool.

stderr - Sep 02, 2015 at 08:10

"If you know of any software that can pre-process audio files and produce a lip-sync track, let me know."

Does it have to be audio files? Maybe it would be easier to video-record the voice-actors and track their lips while they say their lines.

Paulup - Sep 02, 2015 at 09:46

I think the head bobbing is fine, surprised at people's reactions to it...
The mouth looks great - adding animations really bring the character to life and give more of an idea what it's all meant to be like.

Jammet - Sep 02, 2015 at 10:37

Oooh yeah! This is just what the doctor ordered! I love the headbobbin'! As for lipsyncing, you could just use the raw text, feed that to the lipsync parser, and it'll shape the mouth by using the respective bitmap you assigned for each letter or, maybe syllable.

Bogdan Barbu - Sep 03, 2015 at 16:50

And someone will speak over the animation at exactly the right pace? It has to go the other way around...

Marcel Taeumel - Sep 02, 2015 at 13:16

There should be lip-syncing at least for phrases like "Aaaaahhhhhhhh!!!!!" :)

Sushi - Sep 02, 2015 at 14:16

What about us taking the Papagayo Python code (GPL) and say "fuck it, I'll write my own."? Or rather "I'll write some extensions to improve the automatic matching to audio files." It seems you want to limit or even avoid manual fine tuning of the audio script to the mouth positions...
So, any backers out there who want to give that a try and see if we can really back Ron on this? Before he cuts even more stuff. He also cut the verbs and the inventory and those nice icons in case you did not notice.

Bogdan Barbu - Sep 03, 2015 at 08:37

The more lengthy the task, the more expensive manual labour gets. With automated tools, you only have to pay a constant amount, regardless of the length of the task. That's why they want to limit manual tuning or avoid it altogether if they can. However, what you're proposing (i.e., building the perfect tool from scratch or even improving a separate one) is more prohibitive than having a human tune the output of some imperfect tool. This sort of work is pretty complicated and involves a lot of complex mathematics. You won't have an easy time finding people who have the necessary training. Just to give you an idea, as a consultant, I won't charge less than $250/hour for this type of work---and it'll take quite a while to get something that's close to perfect. It's the sort of thing you might want a separate Kickstarter for. :)

The nice "3D" icons weren't cut, it's just that they didn't get around to making the final versions for all of them yet. Many are still just wireframes for now. Or maybe, since you've mentioned the verbs, you're making a joke about the picture Ron posted in the comments section (even though you replied to the main post rather than his comment)?

Sushi - Sep 03, 2015 at 16:59

Of course I was joking about the cutting!

I know it would cost a lot of money, and 3k doesn't sound outrageous at all to me (in my line of work, yearly licenses cost easily tenfold); so I was saying we'd do it for free. As in charity. As in voluntary. And as in possible eternal fame, being part of TP other than a prepaid entry in the phone book.

Sorry for replying to your other comment below in this one as well, but I'm lazy. And the mathematics of the seckrit question is really hard. ;)

Mike McP - Sep 03, 2015 at 17:44

If the market really bears $3-3.5k for this kinda tool, and Ron thinks he could build something semi-working with XX hours of labor, then maybe it's something he spins off as a product on grumpygamer to recoup some labor costs.

Personally, in an age where crappy Office suite costs $300, and has barely changed in 12yrs, $3k for something this specialized didn't sound unreasonable to me. To build something to parse an audio stream and recognize certain sounds and inflections, that sounds hella complex (but oddly, fun..)

Markus - Sep 17, 2015 at 09:48

Actually there are pretty forward ways to do it. You can find vowels and stuff like that with certain techniques used in voice recognition

Bogdan Barbu - Sep 03, 2015 at 08:41

Why do you think a license to use the other tool is $3000? They know the value of their product.

Heinz - Sep 02, 2015 at 17:06

Just compared the screenshot from the about section and the video in this article- it's amazing the progress so far. Congratulations.

Herbert - Sep 03, 2015 at 17:13

Good you wrote this. I also had a look, and yes, there has been a lot of development. Wonderful artwork!

It’s just that… the “old” font doesn’t fit any more. The picture looks like from 1992, the font from 1987.

Farooq - Sep 03, 2015 at 06:19

Love the hybrid i notice between the Maniac Mansion and Monkey Island character design style. thus resulting in a totally different character design. I LOVE IT RON! I LAAVV IT. :D

Ema - Sep 03, 2015 at 08:17

1) "the presence of an audio track of spoken speech, lip sync is very important": I agree

2)generally speaking head bobbing is good, otherwise the head is unnaturally static. Small animations of gestures and other "body language" could improve the natural feeling, and help to avoid the look of the "soldier waiting for orders". Nevertheless, i also agree that the bobbing in MI andy indy felt much more natural than this. Maybe you should try the suggestion of not moving the ears: after all, a natural movement is a basculating tilt of the head along the axis of the ears, and not a piston up-and-down movement. At the moment, head bobbing is a bit awkward, and worths improvement. Maybe the ears thing could be tested.

Ema - Sep 03, 2015 at 08:19

Ops, at the beginning I meant "in the presence of..."
Sorry

Paulup - Sep 03, 2015 at 10:40

The lip-syncing would be easy to do if each character only spoke in one vowel sound:

Ray: Who you shoot, you brute?
Reyes: That rat had bad man plan and ran.

Problem solved.

Domino - Sep 03, 2015 at 11:20

So, have you decided not to use Spine for character animation? Or are you using Spine for something else? :)

Ron Gilbert ✓ - Sep 03, 2015 at 11:23

We were never going to use Spine for character animation, it's not really suited to 8-bit art and maintaining pixel purity. We are thinking about using Spine to do larger cut-scene animations, but we're waiting to see how many of them there are before spending the time to get it working.

Domino - Sep 04, 2015 at 02:30

Cool, thanks for the info!

Although, have you ever considered using e. g. Spine-based animations (not necessarily pixelated at all) and downsample them in real-time? I know that there would be much less control on individual pixels, but I was just wondering if you've ever considered that approach (and if there were any other reasons why you would rule out that option). Because, in the end, if it's easier to make and requires much less memory, maybe it's worth thinking about?

pumbaa - Sep 03, 2015 at 11:40

Thanks Mark Ferarri.
My Perlerbeads-Art is now outdated!!!

Will it become worthless or increase its value?

http://img.pr0gramm.com/2015/02/02/a65bd717edde73b0.jpg

Bogdan Barbu - Sep 03, 2015 at 16:51

For a second there, I was worried I might have a problem because I comment quite a lot on this blog. Thanks, I needed this.

Zombocast - Sep 03, 2015 at 14:14

I remember trying to make a source videos in Garries Mod and trying to get vowls to sync with the lips with audio clips from Audasity. Source Filmaker made videos cleaner and easier to sync up.
With that said the only issue I see with the head bob is the body is now too static, it needs the arms bent upwards with that stance.

Zombocast - Sep 03, 2015 at 22:28

The program's name was Face Poser, a developer tool within the source SDK.
You're on the right track to make a small list of Vowels and assign them facial movements.
Hold them for various lengths of time to form words. Here's an example:
https://youtu.be/9DqQQ7p1d9k?t=457

Zombocast - Sep 03, 2015 at 22:32

P.S you should make Gary write it and call it Gary's Mod. HAHA

thiezn - Sep 04, 2015 at 03:46

Wow, I really like how the character evolved. He really fits into the background now, come a long way since Mark's first backgrounds were introduced to us

DZ-Jay - Sep 04, 2015 at 05:33

Hmmm... I know what you are trying to do, Ron, and you are right in that without any bobbing, it looks too static.  However, I don't think the current two-step implementation is the way to go; it's rather jarring, just a bit too brusque.  I see the lips moving, which gives me a particular mental model of what's going on, and then once in a while... BAM! that head just jerks a few pixels.  It actually looks like it stretches vertically.

Good idea, but the implementation could use some subtlety.  :)

   Cheers!
      -dZ.

Zak Phoenix McKracken - Sep 04, 2015 at 08:40

[OFF TOPIC]
Pardon... no friday-post-to-post-your-question today?
[/OFF TOPIC]

Paulup - Sep 04, 2015 at 09:06

I guess there were plenty of questions left unanswered from the last one?

Derrick Reisdorf - Sep 04, 2015 at 10:22

Oh, and I'm sure Noah Falstein is busy making the big bucks at Google.

Zak Phoenix McKracken - Sep 04, 2015 at 17:37

Lucky him... but I was interested in Aric Wilmunder :-)

Zombocast - Sep 04, 2015 at 22:00

It's Friday, Friday, Gotta big frown on Friday
Everybody's lookin' forward to the
weekend, weekend podcast. - Ransom the clown

Alex from Grmny - Sep 07, 2015 at 08:15

I'm going with what someone else has posted here above me.<br>
The head bobbing looks weird. <br>
I wouldn't say it's tweakable. <br>
It's the size of the head plus the bobbing in my opinion.

I think in MI, the talking animations had their own charme and they sure were noticeable as somewhat odd, but never in a bad way, rather in a good way. This worked, because the heads weren't as huge as in Thimbleweed Park.<br>
I couldn't say the same about what you're showing right now. Maybe going without the head bobbing as in Maniac Mansion is the better thing to do. The mouth movement itself looks great at first glance, no need for lip synching.

ac - Sep 23, 2015 at 07:39

I think the characters head is couple pixels too large in both dimensions in this screenshot, so it looks like it was more up close than the rest of the scene implies. Also the mouth is too open while talking (unless he's screaming perhaps).