Walking and Talking
Aug 31, 2015
Quick blog post today.
Spent much of the weekend and today rewriting my engine's animation system to give it more flexibility for doing walking, talking and other multi-layered animations.
The main issues is wanting to play a talking animation, which moves the mouth and head, but then wanting to play other animations, like walking, pointing, shrugs and other gestures. If every animation has to have a version with talking, the permutations quickly makes you want to bang your head repeatedly against your desk.
SCUMM had a great animation system called Byle. It could play several animations simultaneously, and as long as they used different layers it would play them all at once. It made doing things like playing a talking animation and then playing a shrug really easy. It was also easy for the artist, because they would just have an attach point for the head and could move the body as needed.
This stuff is pretty routine for 3D animation systems or 2D animation systems like Spine, but for pure bitmap graphics, I have yet to find a good one. So, as often happens, when I can't find a tool that works, I just say fuck it, I'll write my own.
Gary started by the chunking of Reyes' animations into head, body and mouth layers and we went through several iterations of how to best layer them for maximum flexibility and (more importantly) ease of creation.
Our system also allows for the option of lip-syncing, if we decide to do that. The system understands the basic vowel positions and can set the mouth frame based on external input. Right now it's just a weighted random number, but if we had lip syncing data, it would be fed in instead.
I don't know if we'll do lip syncing, it's an amazing amount of work unless it's automated and I haven't looked into the current state of automated lip-syncing. Eight years ago it was crap, but we now live in a future of self-driving cars and staying in strangers houses instead of hotels, so who knows what other crazy things have been invented.
If you know of any software that can pre-process audio files and produce a lip-sync track, let me know. It doesn't have to do it real-time, it can (and probably should) be a pre-processed staged.
Gary and I also decided to do head bobbing along with mouth movement, but reduced from what was in Monkey Island. Back at Lucasfilm, when we went from the large headed characters to more realistically proportionally sized heads, the mouth became a single pixel and it was hard to tell if Guybrush was even talking.
With the larger stylized heads in Thimbleweed Park, it's easy to tell if they are talking, but having no movement of the head felt too static, so the plan we ultimately fell on was to have two head positions; normal and up. 80% of the time, the head is in the normal position and will randomly go to the up position 20% of the time. I think it works well, but I'm sure we will endlessly tweak it in the coming months.
As with a lot of what we post, what you're seeing is not final animation, just something we put together as a test, so don't nitpick. There is still a lot of work to do.
- Ron
P.S. IGNORE THE ICONS!
http://www.lostmarble.com/papagayo/index.shtml
is a free Lip-syncing software made in Python for windows and mac...but maybe Python is not suited for your needs...
amazing blog! Tnx!
I would figure your head-bumpiness-likelihood increases during saying 'a' rather than 'm' ?
SURELY everyone will notice. Still, it makes you feel good.
any special annotations and produce good output. Of course, they would only use
the output metadata (or use some custom code to produce data, it is open source for licensee),
without needing the 3d part. The actual benefit is in the audio processing.
http://www.annosoft.com/lipsync-sdks
but I've no idea of their quality.
Look at this video of King's Quest VI from 1992: <https://youtu.be/fEUFzSb5utk>. There's a lot of dialog right at the start. From what I understand, all the lip-sync data was extracted automatically from the voice recordings. I'm not sure if the company (Bright Star Technology) is still around, but you might try contacting Elon Gasper, who founded the company and apparently still holds the patents. <https://www.linkedin.com/in/elongasper> Even if you can't use their technology, he might have an idea as to the current state of the art.
Maybe I'm too tied to Maniac Mansion style: big head, only mouth movement.
I think that something like the "version 2" of Maniac Mansion should be enough: mouth movement, but fixed head. When the character is side-view, it can move its jawbone up and down.
Besides, there is something retro about the characters randomly moving their mouths when talking. I like it. :)
Regarding the talking animation, I think the mouth looks better than I was expecting but the head bobbing is turning out to be not-so-great. Although I theoretically understand the animation, to me it doesn't register as slightly changing the orientation of the head but as if something weird and disturbing is happening to it.
Sorry.
Push Edit Button : "It doesn't seem to work."
I think they do English, French, German, Spanish and Italian.
They used to deliver .anim files for our pipeline but you can really ask them for any kind of outputs.
Might be worth a try.
How about keeping the ears in a constant position in order to prevent the impression of a stretching neck.
Besides you could individualize the different characters. For example some smart-alec people use to raise their eyebrows, nervous people use to blink in a high frequency when they are talking. And narcissistic people often smile while they are speaking.
http://www.court-records.net/animation/armstrong-sweating%28b%29.gif
http://www.court-records.net/animation/moe-tsk%28b%29.gif
http://www.court-records.net/animation4/brushel-smile%28b%29.gif
If it was just up to me, I wouldn't worry about the lip-synching - it's possible it won't add much to the game at the cost of too much work developing it unless there's something that will easily work against the audio files.
As a middle-ground I might be tempted to have two versions of the speech text - one version that's displayed, and another one, that could be configured with a couple of extra codes (like "[speed=1] or "[pause=10]", etc to be parsed with the animation), for pacing at more pertinent or repeated phrases and sentences.
There, I said it!
Hurry up, just nine months left ;-)
But this time, quite frankly, I think that the head bobbing doesn't fit at all. I would rather prefer the character to blink or perform any other "tiny" movement. The head bobbing appears way too exaggerated to me, and in normal life nobody extrude his/hers head so much when talking.
Guess they'll adjust this as the proceed...
I really respect your dedication and work, Ron. Also, I really appreciate that your are sharing with us your early and crude experimentations.
That was just my personal consideration and feedback on "how" to head movement looked.
I didn't mean to be unrespectful, at all. In fact I exactly know how software (and game) "continuous refinement process" works. I should say that I adopt your exact approach in developing software since a long time.
We have just to be patient, yep.
Just dial back the head bob, add random blinking, and move on.
I very much like the mouth animation though. It is so much more detailed than those in Maniac Mansion.
I know it would take more work, but perhaps a slight head nod and/or wobble and moving jaw would be less jarring. This head bob should be used only in rare occasions where extreme emphasis (or coughing or hiccupping) is required.
Otherwise, I would say just "86" the head bob.
If hooked up with lip sync program, I wonder if you could implement the bob based on inflection/volume of the voice...?
Otherwise the scene looks fantastic! I especially love the final icons. ;)
I know it's a first pass, but as others said, I find the head movement not very natural.
Love the art of that 'room'!
Let me preface what I'm about to say by stating:
1) The characters' mouths will be animated in the game
2) The game will include recorded audio of voiced dialogue.
Ron feels like the mouth animation should be lip-synched to the audio recordings of voiced dialogue.
Instead of manually trying to make the animations match the voice recordings, he wanted to find an automated lip-synching solution that would do the work for him.
If the mouth animations were not synced with the voice recordings, that would be very odd.
IF there was no voice planned for the game, then lip-synching wouldn't matter very much (since it would be difficult to gauge the speed at which the player reads the text, anyway).
"P.S. : Ignore the icons!"
Oh crap, sorry!
The environmental art looks as great as expected, even though it'd be better if the water was animated Woodtick-style.
http://cmusphinx.sourceforge.net/wiki/phonemerecognition
Alternatively (quicker and shorter to write) you could just assign 20% to *toggle* the head position. That reduces a single bop to 4%.
As a fish, I meant to say homesickness!
<drum roll>
Head bobbing looks a bit weird to me, but your mileage may vary. How about adding an option to the settings so you can turn it on and off?
Apart from that: I would not make it truly random, as it looks strange if in two iterations of the same sentence, Reyes does different head-bobbing. Make it pseudo-random by using some sort of checksum of each displayed line as the seed. That way, every line will produce different head-bobbing, but every line will always create the same bobbing pattern.
In this example, only do the head bob for the words that are capitalized:
I need to go to the bathroom.
We'll be in town in about 5 minutes.
I mean I REALLY need to go to the bathroom.
It's "rank AND file", not "rank IN file"... idiot.
http://images.thimbleweedpark.com/noheads.png
I'm really looking forward to a proper trachea/esophagus animation.
If the characters would be invisible, then you wouldn't need to spend time animating at all.
The background tough. The lighting. Oh my goddess of palettes. Its wonderful.
If you used subpixels, it would result in a kind of soap opera effect, which would mean that the moving object sets itself apart from the surroundings.
The only reasonable way to make it look more smooth is to use 2 pixels instead of one. Though Ron announced to make it reduced from how it was in Monkey Island, so he makes use of a single pixel. I understand that. On the other hand the heads are bigger this time, so the proportion of 2 px looks smaller relating to the head. Therefore 2 px could be still a solution.
They have a free demo application that demonstrates the SDK. I just tested it with some voice recordings and it seems to do a decent job. The results tend to look a bit mechanical, but that may be less of a problem when the mouths are only a few pixels big.
The pricing is US$ 3000 to 3500 per game. Might be worth checking out: http://www.annosoft.com/lipsync-sdks
I can't vouch for the quality though, I haven't tried it.
Everything looks absolutely great!
This is the first kickstarter project iam in, at which everything that is shown from the game during the developement process makes me more and more certain that i had not only made the right decision supporting it, i think you guys are doing it absolutely right.
I noticed that there are people criticising and giving hints (the the lip syncing or the head bobbing), at first i thought they have to be mad or something but then i noticed that this, like most other topics, will polarize the people. Which means there is absolutely NO SENSE in givng suggestions because for every strong opinion there is at least another strong oposite opinion.
So i suggest every one should stop giving any suggestions from now one, everything else would be equal to the self declaration of beeing irrational, which would render any given statement worthless.
Thanks bye.
I think these comments are only signals, a sort of poll among the most valuable audience in adventure games.
They can be ignored, or considered by the authors, in order to pursue what they think is the best possible final result.
Any criticism should be factual and constructive. There is no place for sheer nagging.
The team should keep cool.
Does it have to be audio files? Maybe it would be easier to video-record the voice-actors and track their lips while they say their lines.
The mouth looks great - adding animations really bring the character to life and give more of an idea what it's all meant to be like.
So, any backers out there who want to give that a try and see if we can really back Ron on this? Before he cuts even more stuff. He also cut the verbs and the inventory and those nice icons in case you did not notice.
The nice "3D" icons weren't cut, it's just that they didn't get around to making the final versions for all of them yet. Many are still just wireframes for now. Or maybe, since you've mentioned the verbs, you're making a joke about the picture Ron posted in the comments section (even though you replied to the main post rather than his comment)?
I know it would cost a lot of money, and 3k doesn't sound outrageous at all to me (in my line of work, yearly licenses cost easily tenfold); so I was saying we'd do it for free. As in charity. As in voluntary. And as in possible eternal fame, being part of TP other than a prepaid entry in the phone book.
Sorry for replying to your other comment below in this one as well, but I'm lazy. And the mathematics of the seckrit question is really hard. ;)
Personally, in an age where crappy Office suite costs $300, and has barely changed in 12yrs, $3k for something this specialized didn't sound unreasonable to me. To build something to parse an audio stream and recognize certain sounds and inflections, that sounds hella complex (but oddly, fun..)
It’s just that… the “old” font doesn’t fit any more. The picture looks like from 1992, the font from 1987.
2)generally speaking head bobbing is good, otherwise the head is unnaturally static. Small animations of gestures and other "body language" could improve the natural feeling, and help to avoid the look of the "soldier waiting for orders". Nevertheless, i also agree that the bobbing in MI andy indy felt much more natural than this. Maybe you should try the suggestion of not moving the ears: after all, a natural movement is a basculating tilt of the head along the axis of the ears, and not a piston up-and-down movement. At the moment, head bobbing is a bit awkward, and worths improvement. Maybe the ears thing could be tested.
Sorry
Ray: Who you shoot, you brute?
Reyes: That rat had bad man plan and ran.
Problem solved.
Although, have you ever considered using e. g. Spine-based animations (not necessarily pixelated at all) and downsample them in real-time? I know that there would be much less control on individual pixels, but I was just wondering if you've ever considered that approach (and if there were any other reasons why you would rule out that option). Because, in the end, if it's easier to make and requires much less memory, maybe it's worth thinking about?
My Perlerbeads-Art is now outdated!!!
Will it become worthless or increase its value?
http://img.pr0gramm.com/2015/02/02/a65bd717edde73b0.jpg
With that said the only issue I see with the head bob is the body is now too static, it needs the arms bent upwards with that stance.
You're on the right track to make a small list of Vowels and assign them facial movements.
Hold them for various lengths of time to form words. Here's an example:
https://youtu.be/9DqQQ7p1d9k?t=457
Good idea, but the implementation could use some subtlety. :)
Cheers!
-dZ.
Pardon... no friday-post-to-post-your-question today?
[/OFF TOPIC]
Everybody's lookin' forward to the
weekend, weekend podcast. - Ransom the clown
The head bobbing looks weird. <br>
I wouldn't say it's tweakable. <br>
It's the size of the head plus the bobbing in my opinion.
I think in MI, the talking animations had their own charme and they sure were noticeable as somewhat odd, but never in a bad way, rather in a good way. This worked, because the heads weren't as huge as in Thimbleweed Park.<br>
I couldn't say the same about what you're showing right now. Maybe going without the head bobbing as in Maniac Mansion is the better thing to do. The mouth movement itself looks great at first glance, no need for lip synching.